Message ID | 20231006155646.12938-1-hgajjar@de.adit-jv.com (mailing list archive) |
---|---|
State | Accepted |
Commit | f49449fbc21e7e9550a5203902d69c8ae7dfd918 |
Headers | show |
Series | [v4] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach | expand |
+Cc: Ferry. On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: > This patch replaces the usage of netif_stop_queue with netif_device_detach > in the u_ether driver. The netif_device_detach function not only stops all > tx queues by calling netif_tx_stop_all_queues but also marks the device as > removed by clearing the __LINK_STATE_PRESENT bit. > > This change helps notify user space about the disconnection of the device > more effectively, compared to netif_stop_queue, which only stops a single > transmit queue. This change effectively broke my USB ether setup. git bisect start # status: waiting for both good and bad commits # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 # status: waiting for bad commit, 1 good commit known # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state" git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect git bisect good 97475763484245916735a1aa9a3310a01d46b008 # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach Note, revert indeed helps. Should I send a revert? I use configfs to setup USB EEM function and it worked till this commit. If needed, I can share my scripts, but I believe it's not needed as here we see a clear regression.
On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: > +Cc: Ferry. > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: > > This patch replaces the usage of netif_stop_queue with netif_device_detach > > in the u_ether driver. The netif_device_detach function not only stops all > > tx queues by calling netif_tx_stop_all_queues but also marks the device as > > removed by clearing the __LINK_STATE_PRESENT bit. > > > > This change helps notify user space about the disconnection of the device > > more effectively, compared to netif_stop_queue, which only stops a single > > transmit queue. > > This change effectively broke my USB ether setup. > > git bisect start > # status: waiting for both good and bad commits > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 > # status: waiting for bad commit, 1 good commit known > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state" > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect > git bisect good 97475763484245916735a1aa9a3310a01d46b008 > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach > > Note, revert indeed helps. Should I send a revert? > > I use configfs to setup USB EEM function and it worked till this commit. > If needed, I can share my scripts, but I believe it's not needed as here > we see a clear regression. > > -- > With Best Regards, > Andy Shevchenko > > Without this patch, there may be a potential crash in a race condition, as __LINK_STATE_PRESENT is monitored at many places in the Network stack to determine the status of the link. Could you please provide details on how this patch affects your functionality? Are you experiencing connection problems or data transfer interruptions? Instead of reverting this patch, consider trying the upcoming patch (soon to be available in the mainline) to see if it resolves your issue. https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de Thanks, Hardik
Hi, Op 15-01-2024 om 14:27 schreef Hardik Gajjar: > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: >> +Cc: Ferry. >> >> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: >>> This patch replaces the usage of netif_stop_queue with netif_device_detach >>> in the u_ether driver. The netif_device_detach function not only stops all >>> tx queues by calling netif_tx_stop_all_queues but also marks the device as >>> removed by clearing the __LINK_STATE_PRESENT bit. >>> >>> This change helps notify user space about the disconnection of the device >>> more effectively, compared to netif_stop_queue, which only stops a single >>> transmit queue. >> >> This change effectively broke my USB ether setup. >> >> git bisect start >> # status: waiting for both good and bad commits >> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty >> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 >> # status: waiting for bad commit, 1 good commit known >> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag 'usb-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb >> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b >> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix wrong data added to platform device >> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d >> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: add a bandwidth budget table >> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 >> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: gadget: uvc: cleanup request when not in correct state" >> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 >> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: tps6598x: Refactor tps6598x port registration >> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 >> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach >> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 >> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix stub_dev hub disconnect >> git bisect good 97475763484245916735a1aa9a3310a01d46b008 >> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix a type promotion bug >> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 >> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach >> >> Note, revert indeed helps. Should I send a revert? >> >> I use configfs to setup USB EEM function and it worked till this commit. >> If needed, I can share my scripts, but I believe it's not needed as here >> we see a clear regression. >> >> -- >> With Best Regards, >> Andy Shevchenko >> >> > > Without this patch, there may be a potential crash in a race condition, as __LINK_STATE_PRESENT is monitored at many places in the Network stack to determine the status of the link. > > Could you please provide details on how this patch affects your functionality? Are you experiencing connection problems or data transfer interruptions? In my case on mrfld (Intel Edison Arduino) using configfs with this patch no config from host through dhcp is received. Manual setting correct ipv4 addr / mask / gw still no connection. > Instead of reverting this patch, consider trying the upcoming patch (soon to be available in the mainline) to see if it resolves your issue. > > https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de This patch works for me with v6.7.0. > Thanks, > Hardik
Hi, Op 15-01-2024 om 21:10 schreef Ferry Toth: > Hi, > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar: >> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: >>> +Cc: Ferry. >>> >>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: >>>> This patch replaces the usage of netif_stop_queue with >>>> netif_device_detach >>>> in the u_ether driver. The netif_device_detach function not only >>>> stops all >>>> tx queues by calling netif_tx_stop_all_queues but also marks the >>>> device as >>>> removed by clearing the __LINK_STATE_PRESENT bit. >>>> >>>> This change helps notify user space about the disconnection of the >>>> device >>>> more effectively, compared to netif_stop_queue, which only stops a >>>> single >>>> transmit queue. >>> >>> This change effectively broke my USB ether setup. >>> >>> git bisect start >>> # status: waiting for both good and bad commits >>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag >>> 'tty-6.7-rc1' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty >>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 >>> # status: waiting for bad commit, 1 good commit known >>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag >>> 'usb-6.7-rc1' of >>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb >>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b >>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix >>> wrong data added to platform device >>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d >>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: xhci-mtk: >>> add a bandwidth budget table >>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 >>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: >>> gadget: uvc: cleanup request when not in correct state" >>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 >>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: >>> tps6598x: Refactor tps6598x port registration >>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 >>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: >>> u_ether: Replace netif_stop_queue with netif_device_detach >>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 >>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: fix >>> stub_dev hub disconnect >>> git bisect good 97475763484245916735a1aa9a3310a01d46b008 >>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: fix >>> a type promotion bug >>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 >>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: >>> gadget: u_ether: Replace netif_stop_queue with netif_device_detach >>> >>> Note, revert indeed helps. Should I send a revert? >>> >>> I use configfs to setup USB EEM function and it worked till this >>> commit. >>> If needed, I can share my scripts, but I believe it's not needed as >>> here >>> we see a clear regression. >>> >>> -- >>> With Best Regards, >>> Andy Shevchenko >>> >>> >> >> Without this patch, there may be a potential crash in a race >> condition, as __LINK_STATE_PRESENT is monitored at many places in the >> Network stack to determine the status of the link. >> >> Could you please provide details on how this patch affects your >> functionality? Are you experiencing connection problems or data >> transfer interruptions? > > In my case on mrfld (Intel Edison Arduino) using configfs with this > patch no config from host through dhcp is received. Manual setting > correct ipv4 addr / mask / gw still no connection. > >> Instead of reverting this patch, consider trying the upcoming patch >> (soon to be available in the mainline) to see if it resolves your issue. >> >> https://lore.kernel.org/lkml/2023122900-commence-agenda-db2c@gregkh/T/#m36a812d3f1e5d744ee32381f6ae4185940b376de >> > > This patch works for me with v6.7.0. I need to revisit this. The patch in this topic landed in v6.7.0-rc1 (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other platforms as well. The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am able to make a USB EEM function again. However, now a hidden issue appears. With mrfld there is an external switch to easily switch between host and device mode. What is not fixed: - when in device mode and unplugging/plugging the cable when using `ifconfig usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route table is updated and the dir `/sys/class/net/usb0` exists and in the dir `cat carrier*` shows the carrier up and down counts. This is the expected behavior. - when in device mode and switching to host mode `ifconfig usb0` continues to show "RUNNING", the route table is not modified and the dir `/sys/class/net/usb0` no longer exists. - switching to device mode again, USB EEM works fine, no changes to RUNNING or the route table happen and the dir `/sys/class/net/usb0` still is non- existing. - unplugging/plugging the cable in device mode after this does not restore the original situation. This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential patches on top) and bisected back to this patch in v6.70-rc1. It seems `netif_device_detach` does not completely clean up as expected and `netif_device_attach` does not completely rebuild. I am wondering if on other platforms this can be reproduced? If so, inho it would be best to revert the both patches until the issue is resolved. Thanks, Ferry >> Thanks, >> Hardik >
On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote: > Hi, > > Op 15-01-2024 om 21:10 schreef Ferry Toth: > > Hi, > > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar: > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: > > > > +Cc: Ferry. > > > > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: > > > > > This patch replaces the usage of netif_stop_queue with > > > > > netif_device_detach > > > > > in the u_ether driver. The netif_device_detach function not > > > > > only stops all > > > > > tx queues by calling netif_tx_stop_all_queues but also marks > > > > > the device as > > > > > removed by clearing the __LINK_STATE_PRESENT bit. > > > > > > > > > > This change helps notify user space about the disconnection > > > > > of the device > > > > > more effectively, compared to netif_stop_queue, which only > > > > > stops a single > > > > > transmit queue. > > > > > > > > This change effectively broke my USB ether setup. > > > > > > > > git bisect start > > > > # status: waiting for both good and bad commits > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag > > > > 'tty-6.7-rc1' of > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 > > > > # status: waiting for bad commit, 1 good commit known > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag > > > > 'usb-6.7-rc1' of > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix > > > > wrong data added to platform device > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: > > > > xhci-mtk: add a bandwidth budget table > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: > > > > gadget: uvc: cleanup request when not in correct state" > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: > > > > tps6598x: Refactor tps6598x port registration > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: > > > > u_ether: Replace netif_stop_queue with netif_device_detach > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: > > > > fix stub_dev hub disconnect > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008 > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: > > > > fix a type promotion bug > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] > > > > usb: gadget: u_ether: Replace netif_stop_queue with > > > > netif_device_detach > > > > > > > > Note, revert indeed helps. Should I send a revert? > > > > > > > > I use configfs to setup USB EEM function and it worked till this > > > > commit. > > > > If needed, I can share my scripts, but I believe it's not needed > > > > as here > > > > we see a clear regression. > > > > > > > > -- > > > > With Best Regards, > > > > Andy Shevchenko > > > > > > > > > > > > > > Without this patch, there may be a potential crash in a race > > > condition, as __LINK_STATE_PRESENT is monitored at many places in > > > the Network stack to determine the status of the link. > > > > > > Could you please provide details on how this patch affects your > > > functionality? Are you experiencing connection problems or data > > > transfer interruptions? > > > > In my case on mrfld (Intel Edison Arduino) using configfs with this > > patch no config from host through dhcp is received. Manual setting > > correct ipv4 addr / mask / gw still no connection. > > > > > Instead of reverting this patch, consider trying the upcoming patch > > > (soon to be available in the mainline) to see if it resolves your > > > issue. > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e= > > > > > > > This patch works for me with v6.7.0. > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1 > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other > platforms as well. > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am > able to make a USB EEM function again. > > However, now a hidden issue appears. With mrfld there is an external switch > to easily switch between host and device mode. > > What is not fixed: > > - when in device mode and unplugging/plugging the cable when using `ifconfig > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route > table is updated and the dir `/sys/class/net/usb0` exists and in the dir > `cat carrier*` shows the carrier up and down counts. This is the expected > behavior. > > - when in device mode and switching to host mode `ifconfig usb0` continues > to show "RUNNING", the route table is not modified and the dir > `/sys/class/net/usb0` no longer exists. > > - switching to device mode again, USB EEM works fine, no changes to RUNNING > or the route table happen and the dir `/sys/class/net/usb0` still is non- > existing. > > - unplugging/plugging the cable in device mode after this does not restore > the original situation. > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential > patches on top) and bisected back to this patch in v6.70-rc1. > > It seems `netif_device_detach` does not completely clean up as expected and > `netif_device_attach` does not completely rebuild. > > I am wondering if on other platforms this can be reproduced? If so, inho it > would be best to revert the both patches until the issue is resolved. > > Thanks, > > Ferry I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available. This behavior seems not correct. The gether_cleanup function should remove the interface along with the associated kobject, and this function should only be called during the unloading of the driver or deleting the gadget. Could you please confirm whether any of your custom modifications are removing the net interface kobject? > > > > Thanks, > > > Hardik > >
Hi Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote: >> Hi, >> >> Op 15-01-2024 om 21:10 schreef Ferry Toth: >>> Hi, >>> >>> Op 15-01-2024 om 14:27 schreef Hardik Gajjar: >>>> On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: >>>>> +Cc: Ferry. >>>>> >>>>> On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: >>>>>> This patch replaces the usage of netif_stop_queue with >>>>>> netif_device_detach >>>>>> in the u_ether driver. The netif_device_detach function not >>>>>> only stops all >>>>>> tx queues by calling netif_tx_stop_all_queues but also marks >>>>>> the device as >>>>>> removed by clearing the __LINK_STATE_PRESENT bit. >>>>>> >>>>>> This change helps notify user space about the disconnection >>>>>> of the device >>>>>> more effectively, compared to netif_stop_queue, which only >>>>>> stops a single >>>>>> transmit queue. >>>>> This change effectively broke my USB ether setup. >>>>> >>>>> git bisect start >>>>> # status: waiting for both good and bad commits >>>>> # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag >>>>> 'tty-6.7-rc1' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty >>>>> git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 >>>>> # status: waiting for bad commit, 1 good commit known >>>>> # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag >>>>> 'usb-6.7-rc1' of >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb >>>>> git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b >>>>> # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix >>>>> wrong data added to platform device >>>>> git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d >>>>> # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: >>>>> xhci-mtk: add a bandwidth budget table >>>>> git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 >>>>> # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: >>>>> gadget: uvc: cleanup request when not in correct state" >>>>> git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 >>>>> # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: >>>>> tps6598x: Refactor tps6598x port registration >>>>> git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 >>>>> # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: >>>>> u_ether: Replace netif_stop_queue with netif_device_detach >>>>> git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 >>>>> # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: >>>>> fix stub_dev hub disconnect >>>>> git bisect good 97475763484245916735a1aa9a3310a01d46b008 >>>>> # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: >>>>> fix a type promotion bug >>>>> git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 >>>>> # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] >>>>> usb: gadget: u_ether: Replace netif_stop_queue with >>>>> netif_device_detach >>>>> >>>>> Note, revert indeed helps. Should I send a revert? >>>>> >>>>> I use configfs to setup USB EEM function and it worked till this >>>>> commit. >>>>> If needed, I can share my scripts, but I believe it's not needed >>>>> as here >>>>> we see a clear regression. >>>>> >>>>> -- >>>>> With Best Regards, >>>>> Andy Shevchenko >>>>> >>>>> >>>> Without this patch, there may be a potential crash in a race >>>> condition, as __LINK_STATE_PRESENT is monitored at many places in >>>> the Network stack to determine the status of the link. >>>> >>>> Could you please provide details on how this patch affects your >>>> functionality? Are you experiencing connection problems or data >>>> transfer interruptions? >>> In my case on mrfld (Intel Edison Arduino) using configfs with this >>> patch no config from host through dhcp is received. Manual setting >>> correct ipv4 addr / mask / gw still no connection. >>> >>>> Instead of reverting this patch, consider trying the upcoming patch >>>> (soon to be available in the mainline) to see if it resolves your >>>> issue. >>>> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e= >>>> >>> This patch works for me with v6.7.0. >> I need to revisit this. The patch in this topic landed in v6.7.0-rc1 >> (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other >> platforms as well. >> >> The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror >> detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am >> able to make a USB EEM function again. >> >> However, now a hidden issue appears. With mrfld there is an external switch >> to easily switch between host and device mode. >> >> What is not fixed: >> >> - when in device mode and unplugging/plugging the cable when using `ifconfig >> usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to >> "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route >> table is updated and the dir `/sys/class/net/usb0` exists and in the dir >> `cat carrier*` shows the carrier up and down counts. This is the expected >> behavior. >> >> - when in device mode and switching to host mode `ifconfig usb0` continues >> to show "RUNNING", the route table is not modified and the dir >> `/sys/class/net/usb0` no longer exists. >> >> - switching to device mode again, USB EEM works fine, no changes to RUNNING >> or the route table happen and the dir `/sys/class/net/usb0` still is non- >> existing. >> >> - unplugging/plugging the cable in device mode after this does not restore >> the original situation. >> >> This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential >> patches on top) and bisected back to this patch in v6.70-rc1. >> >> It seems `netif_device_detach` does not completely clean up as expected and >> `netif_device_attach` does not completely rebuild. >> >> I am wondering if on other platforms this can be reproduced? If so, inho it >> would be best to revert the both patches until the issue is resolved. >> >> Thanks, >> >> Ferry > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available. > This behavior seems not correct. Exactly. And this didn't happen before the 2 patches. To be precise: /sys/class/net/usb0 is not removed and it is a link, the link target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no longer exists > The gether_cleanup function should remove the interface along with the associated kobject, > and this function should only be called during the unloading of the driver or deleting the gadget. > Could you please confirm whether any of your custom modifications are removing the net interface kobject? As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches, nothing related to gadgets or net. For reference, patches are here: https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch Seems (just guessing) gether_cleanup is being called due to the switch to host mode, but cleanup only partly succeeds. Now I'm finding I can make the net interface disappear with `ip link set dev usb0 down` and the broken link goes away by destroying the gadget in configfs. Is that intentional? Do I need to tear down the gadgets in reverse order as on create when switching to host mode? That would be new. What happens when you trigger a switch to host mode without actually unplugging your cable? >>>> Thanks, >>>> Hardik
On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote: > > > Op 15-01-2024 om 21:10 schreef Ferry Toth: > > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar: > > > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: > > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: > > > > > > > This patch replaces the usage of netif_stop_queue with > > > > > > > netif_device_detach > > > > > > > in the u_ether driver. The netif_device_detach function not > > > > > > > only stops all > > > > > > > tx queues by calling netif_tx_stop_all_queues but also marks > > > > > > > the device as > > > > > > > removed by clearing the __LINK_STATE_PRESENT bit. > > > > > > > > > > > > > > This change helps notify user space about the disconnection > > > > > > > of the device > > > > > > > more effectively, compared to netif_stop_queue, which only > > > > > > > stops a single > > > > > > > transmit queue. > > > > > > This change effectively broke my USB ether setup. > > > > > > > > > > > > git bisect start > > > > > > # status: waiting for both good and bad commits > > > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag > > > > > > 'tty-6.7-rc1' of > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > > > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 > > > > > > # status: waiting for bad commit, 1 good commit known > > > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag > > > > > > 'usb-6.7-rc1' of > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb > > > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b > > > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix > > > > > > wrong data added to platform device > > > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d > > > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: > > > > > > xhci-mtk: add a bandwidth budget table > > > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 > > > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: > > > > > > gadget: uvc: cleanup request when not in correct state" > > > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 > > > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: > > > > > > tps6598x: Refactor tps6598x port registration > > > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 > > > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: > > > > > > u_ether: Replace netif_stop_queue with netif_device_detach > > > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 > > > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: > > > > > > fix stub_dev hub disconnect > > > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008 > > > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: > > > > > > fix a type promotion bug > > > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 > > > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] > > > > > > usb: gadget: u_ether: Replace netif_stop_queue with > > > > > > netif_device_detach > > > > > > > > > > > > Note, revert indeed helps. Should I send a revert? > > > > > > > > > > > > I use configfs to setup USB EEM function and it worked till this > > > > > > commit. > > > > > > If needed, I can share my scripts, but I believe it's not needed > > > > > > as here > > > > > > we see a clear regression. > > > > > > > > > > > Without this patch, there may be a potential crash in a race > > > > > condition, as __LINK_STATE_PRESENT is monitored at many places in > > > > > the Network stack to determine the status of the link. > > > > > > > > > > Could you please provide details on how this patch affects your > > > > > functionality? Are you experiencing connection problems or data > > > > > transfer interruptions? > > > > In my case on mrfld (Intel Edison Arduino) using configfs with this > > > > patch no config from host through dhcp is received. Manual setting > > > > correct ipv4 addr / mask / gw still no connection. > > > > > > > > > Instead of reverting this patch, consider trying the upcoming patch > > > > > (soon to be available in the mainline) to see if it resolves your > > > > > issue. > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e= > > > > > > > > > This patch works for me with v6.7.0. > > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1 > > > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other > > > platforms as well. > > > > > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror > > > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am > > > able to make a USB EEM function again. > > > > > > However, now a hidden issue appears. With mrfld there is an external switch > > > to easily switch between host and device mode. > > > > > > What is not fixed: > > > > > > - when in device mode and unplugging/plugging the cable when using `ifconfig > > > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to > > > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route > > > table is updated and the dir `/sys/class/net/usb0` exists and in the dir > > > `cat carrier*` shows the carrier up and down counts. This is the expected > > > behavior. > > > > > > - when in device mode and switching to host mode `ifconfig usb0` continues > > > to show "RUNNING", the route table is not modified and the dir > > > `/sys/class/net/usb0` no longer exists. > > > > > > - switching to device mode again, USB EEM works fine, no changes to RUNNING > > > or the route table happen and the dir `/sys/class/net/usb0` still is non- > > > existing. > > > > > > - unplugging/plugging the cable in device mode after this does not restore > > > the original situation. > > > > > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential > > > patches on top) and bisected back to this patch in v6.70-rc1. > > > > > > It seems `netif_device_detach` does not completely clean up as expected and > > > `netif_device_attach` does not completely rebuild. > > > > > > I am wondering if on other platforms this can be reproduced? If so, inho it > > > would be best to revert the both patches until the issue is resolved. > > > > > > Thanks, > > > > > > Ferry > > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available. > > This behavior seems not correct. > > Exactly. And this didn't happen before the 2 patches. > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no > longer exists > > > The gether_cleanup function should remove the interface along with the associated kobject, > > and this function should only be called during the unloading of the driver or deleting the gadget. > > Could you please confirm whether any of your custom modifications are removing the net interface kobject? > > As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches, > nothing related to gadgets or net. > > For reference, patches are here: https://github.com/htot/meta-intel-edison/tree/nanbield/meta-intel-edison-bsp/recipes-kernel/linux/files > > 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch > > 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch > > 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch > > Seems (just guessing) gether_cleanup is being called due to the switch to > host mode, but cleanup only partly succeeds. Now I'm finding I can make the > net interface disappear with `ip link set dev usb0 down` and the broken link > goes away by destroying the gadget in configfs. > > Is that intentional? Do I need to tear down the gadgets in reverse order as > on create when switching to host mode? That would be new. > > What happens when you trigger a switch to host mode without actually > unplugging your cable? Since there is no response, should we actually prepare revert next week?
On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > > On Wed, Apr 03, 2024 at 11:01:58PM +0200, Ferry Toth wrote: > > > > Op 15-01-2024 om 21:10 schreef Ferry Toth: > > > > > Op 15-01-2024 om 14:27 schreef Hardik Gajjar: > > > > > > On Sun, Jan 14, 2024 at 06:59:19PM +0200, Andy Shevchenko wrote: > > > > > > > On Fri, Oct 06, 2023 at 05:56:46PM +0200, Hardik Gajjar wrote: > > > > > > > > This patch replaces the usage of netif_stop_queue with > > > > > > > > netif_device_detach > > > > > > > > in the u_ether driver. The netif_device_detach function not > > > > > > > > only stops all > > > > > > > > tx queues by calling netif_tx_stop_all_queues but also marks > > > > > > > > the device as > > > > > > > > removed by clearing the __LINK_STATE_PRESENT bit. > > > > > > > > > > > > > > > > This change helps notify user space about the disconnection > > > > > > > > of the device > > > > > > > > more effectively, compared to netif_stop_queue, which only > > > > > > > > stops a single > > > > > > > > transmit queue. > > > > > > > This change effectively broke my USB ether setup. > > > > > > > > > > > > > > git bisect start > > > > > > > # status: waiting for both good and bad commits > > > > > > > # good: [1f24458a1071f006e3f7449c08ae0f12af493923] Merge tag > > > > > > > 'tty-6.7-rc1' of > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty > > > > > > > git bisect good 1f24458a1071f006e3f7449c08ae0f12af493923 > > > > > > > # status: waiting for bad commit, 1 good commit known > > > > > > > # bad: [2c40c1c6adab90ee4660caf03722b3a3ec67767b] Merge tag > > > > > > > 'usb-6.7-rc1' of > > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb > > > > > > > git bisect bad 2c40c1c6adab90ee4660caf03722b3a3ec67767b > > > > > > > # bad: [17d6b82d2d6d467149874b883cdba844844b996d] usb/usbip: fix > > > > > > > wrong data added to platform device > > > > > > > git bisect bad 17d6b82d2d6d467149874b883cdba844844b996d > > > > > > > # good: [ba6b83a910b6d8a9379bda55cbf06cb945473a96] usb: > > > > > > > xhci-mtk: add a bandwidth budget table > > > > > > > git bisect good ba6b83a910b6d8a9379bda55cbf06cb945473a96 > > > > > > > # good: [dddc00f255415b826190cfbaa5d6dbc87cd9ded1] Revert "usb: > > > > > > > gadget: uvc: cleanup request when not in correct state" > > > > > > > git bisect good dddc00f255415b826190cfbaa5d6dbc87cd9ded1 > > > > > > > # bad: [8f999ce60ea3d47886b042ef1f22bb184b6e9c59] USB: typec: > > > > > > > tps6598x: Refactor tps6598x port registration > > > > > > > git bisect bad 8f999ce60ea3d47886b042ef1f22bb184b6e9c59 > > > > > > > # bad: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] usb: gadget: > > > > > > > u_ether: Replace netif_stop_queue with netif_device_detach > > > > > > > git bisect bad f49449fbc21e7e9550a5203902d69c8ae7dfd918 > > > > > > > # good: [97475763484245916735a1aa9a3310a01d46b008] USB: usbip: > > > > > > > fix stub_dev hub disconnect > > > > > > > git bisect good 97475763484245916735a1aa9a3310a01d46b008 > > > > > > > # good: [0f5aa1b01263b8b621bc4f031a1f2983ef8517b7] usb: usbtest: > > > > > > > fix a type promotion bug > > > > > > > git bisect good 0f5aa1b01263b8b621bc4f031a1f2983ef8517b7 > > > > > > > # first bad commit: [f49449fbc21e7e9550a5203902d69c8ae7dfd918] > > > > > > > usb: gadget: u_ether: Replace netif_stop_queue with > > > > > > > netif_device_detach > > > > > > > > > > > > > > Note, revert indeed helps. Should I send a revert? > > > > > > > > > > > > > > I use configfs to setup USB EEM function and it worked till this > > > > > > > commit. > > > > > > > If needed, I can share my scripts, but I believe it's not needed > > > > > > > as here > > > > > > > we see a clear regression. > > > > > > > > > > > > > Without this patch, there may be a potential crash in a race > > > > > > condition, as __LINK_STATE_PRESENT is monitored at many places in > > > > > > the Network stack to determine the status of the link. > > > > > > > > > > > > Could you please provide details on how this patch affects your > > > > > > functionality? Are you experiencing connection problems or data > > > > > > transfer interruptions? > > > > > In my case on mrfld (Intel Edison Arduino) using configfs with this > > > > > patch no config from host through dhcp is received. Manual setting > > > > > correct ipv4 addr / mask / gw still no connection. > > > > > > > > > > > Instead of reverting this patch, consider trying the upcoming patch > > > > > > (soon to be available in the mainline) to see if it resolves your > > > > > > issue. > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_2023122900-2Dcommence-2Dagenda-2Ddb2c-40gregkh_T_-23m36a812d3f1e5d744ee32381f6ae4185940b376de&d=DwICaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=4g6EtvkKKfp8YYHpU196b2HzQxCMgsAhuo8pFng3X4TCWdcOVEUCug2-l2hRfLyV&s=t82wZAFwm2FTSaacgsmSpZWvWEa89jN8GX-okIJrwFw&e= > > > > > > > > > > > This patch works for me with v6.7.0. > > > > I need to revisit this. The patch in this topic landed in v6.7.0-rc1 > > > > (f49449fbc21e) and breaks the gadget mrfld (Intel Edison Arduino) and other > > > > platforms as well. > > > > > > > > The mentioned fix "usb: gadget: u_ether: Re-attach netif device to mirror > > > > detachment*" * has landed in v6.8.0-rc1 (76c945730). What it does fix: I am > > > > able to make a USB EEM function again. > > > > > > > > However, now a hidden issue appears. With mrfld there is an external switch > > > > to easily switch between host and device mode. > > > > > > > > What is not fixed: > > > > > > > > - when in device mode and unplugging/plugging the cable when using `ifconfig > > > > usb0` the line "usb0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>" changes to > > > > "usb0: flags=4099<UP,BROADCAST,MULTICAST>" as is supposed to, the route > > > > table is updated and the dir `/sys/class/net/usb0` exists and in the dir > > > > `cat carrier*` shows the carrier up and down counts. This is the expected > > > > behavior. > > > > > > > > - when in device mode and switching to host mode `ifconfig usb0` continues > > > > to show "RUNNING", the route table is not modified and the dir > > > > `/sys/class/net/usb0` no longer exists. > > > > > > > > - switching to device mode again, USB EEM works fine, no changes to RUNNING > > > > or the route table happen and the dir `/sys/class/net/usb0` still is non- > > > > existing. > > > > > > > > - unplugging/plugging the cable in device mode after this does not restore > > > > the original situation. > > > > > > > > This behavior I tested on v6.9.0-rc2 (with a few unrelated but essential > > > > patches on top) and bisected back to this patch in v6.70-rc1. > > > > > > > > It seems `netif_device_detach` does not completely clean up as expected and > > > > `netif_device_attach` does not completely rebuild. > > > > > > > > I am wondering if on other platforms this can be reproduced? If so, inho it > > > > would be best to revert the both patches until the issue is resolved. > > > > > > > > Thanks, > > > > > > > > Ferry > > > I'm wondering why the /sys/class/net/usb0 directory is being removed when the network interface is still available. > > > This behavior seems not correct. > > > > Exactly. And this didn't happen before the 2 patches. > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no > > longer exists So, it means that the /sys/class/net/usb0 is present, but the symlink is broken. In that case, the dwc3 driver should recreate the device, and the symlink should become active again I have the dwc3 IP base usb controller, Let me check with this patch and share result here. May be we need some fix in dwc3 > > > > > The gether_cleanup function should remove the interface along with the associated kobject, > > > and this function should only be called during the unloading of the driver or deleting the gadget. > > > Could you please confirm whether any of your custom modifications are removing the net interface kobject? > > > > As far as know not. I have one tusb1210 (usb phy) and 2 dwc3 patches, > > nothing related to gadgets or net. > > > > For reference, patches are here: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_htot_meta-2Dintel-2Dedison_tree_nanbield_meta-2Dintel-2Dedison-2Dbsp_recipes-2Dkernel_linux_files&d=DwICAg&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=b4sDyiL5E5Uvzote95pWzkQE_qFYYrvfF766keBAL41H9ZrXG_fF_I7mAnRnI32b&s=1t88BV-wyxvDFZcsmuO0wtanAnpRPP9TSw3ysu6ZUgs&e= > > > > 0001-phy-ti-tusb1210-write-to-scratch-on-power-on.patch > > > > 0001a-usb-dwc3-core-Fix-dwc3_core_soft_reset-before-anythi.patch > > > > 044-REVERTME-usb-dwc3-gadget-skip-endpoints-ep-18-in-out.patch > > > > Seems (just guessing) gether_cleanup is being called due to the switch to > > host mode, but cleanup only partly succeeds. Now I'm finding I can make the > > net interface disappear with `ip link set dev usb0 down` and the broken link > > goes away by destroying the gadget in configfs. > > > > Is that intentional? Do I need to tear down the gadgets in reverse order as > > on create when switching to host mode? That would be new. > > > > What happens when you trigger a switch to host mode without actually > > unplugging your cable? > > Since there is no response, should we actually prepare revert next week? > > -- > With Best Regards, > Andy Shevchenko > >
On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: ... > > > Exactly. And this didn't happen before the 2 patches. > > > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no > > > longer exists > > So, it means that the /sys/class/net/usb0 is present, but the symlink is > broken. In that case, the dwc3 driver should recreate the device, and the > symlink should become active again > I have the dwc3 IP base usb controller, Let me check with this patch and > share result here. May be we need some fix in dwc3 It's quite possible, please test on your side. We are happy to test any fixes if you come up with.
Hi Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > ... > >>>> Exactly. And this didn't happen before the 2 patches. >>>> >>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link >>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no >>>> longer exists >> So, it means that the /sys/class/net/usb0 is present, but the symlink is >> broken. In that case, the dwc3 driver should recreate the device, and the >> symlink should become active again Yes, on first enabling gadget (when device mode is activated): root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ driver net power sound subsystem suspended uevent Then switching to host mode: root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ ls: cannot access '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file or directory Then back to device mode: root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ driver power sound subsystem suspended uevent net is missing. But, network functions: root@yuna:~# ping 10.42.0.1 PING 10.42.0.1 (10.42.0.1): 56 data bytes Mass storage device is created and removed each time as expected. >> I have the dwc3 IP base usb controller, Let me check with this patch and >> share result here. May be we need some fix in dwc3 Would have been nice if someone could test on other controller as well. But another instance of dwc3 is also very welcome. > It's quite possible, please test on your side. > We are happy to test any fixes if you come up with. >
On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: ... > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no > > > > > longer exists > > > So, it means that the /sys/class/net/usb0 is present, but the symlink is > > > broken. In that case, the dwc3 driver should recreate the device, and the > > > symlink should become active again > > Yes, on first enabling gadget (when device mode is activated): > > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > driver net power sound subsystem suspended uevent > > Then switching to host mode: > > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > ls: cannot access > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file > or directory > > Then back to device mode: > > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > driver power sound subsystem suspended uevent > > net is missing. But, network functions: > > root@yuna:~# ping 10.42.0.1 > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > Mass storage device is created and removed each time as expected. So, what's the conclusion? Shall we move towards revert of those two changes? > > > I have the dwc3 IP base usb controller, Let me check with this patch and > > > share result here. May be we need some fix in dwc3 > Would have been nice if someone could test on other controller as well. But > another instance of dwc3 is also very welcome. > > It's quite possible, please test on your side. > > We are happy to test any fixes if you come up with.
On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > ... > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not removed and it is a link, the link > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no > > > > > > longer exists > > > > So, it means that the /sys/class/net/usb0 is present, but the symlink is > > > > broken. In that case, the dwc3 driver should recreate the device, and the > > > > symlink should become active again > > > > Yes, on first enabling gadget (when device mode is activated): > > > > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > driver net power sound subsystem suspended uevent > > > > Then switching to host mode: > > > > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > ls: cannot access > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file > > or directory > > > > Then back to device mode: > > > > root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > driver power sound subsystem suspended uevent > > > > net is missing. But, network functions: > > > > root@yuna:~# ping 10.42.0.1 > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > Mass storage device is created and removed each time as expected. > > So, what's the conclusion? Shall we move towards revert of those two changes? As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce the issue. I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host) Following are the logs: //Host to device console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/ usb0 //device to host console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/ usb0 s a800000.dwc3/gadget/net/usb0 < addr_assign_type duplex phys_port_name addr_len flags phys_switch_id address gro_flush_timeout power broadcast ifalias proto_down carrier ifindex queues carrier_changes iflink speed carrier_down_count link_mode statistics carrier_up_count mtu subsystem dev_id name_assign_type tx_queue_len dev_port netdev_group type device operstate uevent dormant phys_port_id waiting_for_supplier console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 TX bytes:0 console:/sys/bus/platform/devices/a800000.ssusb # I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available. Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0. According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface. I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ? I am curious why the driver was developed without adhering to the kernel's gadget architecture. > > > > > I have the dwc3 IP base usb controller, Let me check with this patch and > > > > share result here. May be we need some fix in dwc3 > > Would have been nice if someone could test on other controller as well. But > > another instance of dwc3 is also very welcome. > > > It's quite possible, please test on your side. > > > We are happy to test any fixes if you come up with. > > -- > With Best Regards, > Andy Shevchenko > >
Hi, Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >> >> ... >> >>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>> >>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a link, the link >>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no >>>>>>> longer exists >>>>> So, it means that the /sys/class/net/usb0 is present, but the symlink is >>>>> broken. In that case, the dwc3 driver should recreate the device, and the >>>>> symlink should become active again >>> >>> Yes, on first enabling gadget (when device mode is activated): >>> >>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>> driver net power sound subsystem suspended uevent >>> >>> Then switching to host mode: >>> >>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>> ls: cannot access >>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No such file >>> or directory >>> >>> Then back to device mode: >>> >>> root@yuna:~# ls /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>> driver power sound subsystem suspended uevent >>> >>> net is missing. But, network functions: >>> >>> root@yuna:~# ping 10.42.0.1 >>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>> >>> Mass storage device is created and removed each time as expected. >> >> So, what's the conclusion? Shall we move towards revert of those two changes? > > > As promised, I have the tested the this patch with the dwc3 gadget. I could not reproduce > the issue. > > I can see the usb0 exist all the time and accessible regardless of the role switching of the USB mode (peripheral <-> host) > > Following are the logs: > //Host to device > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > mode > console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/ > usb0 > > //device to host > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > console:/sys/bus/platform/devices/a800000.ssusb # ls a800000.dwc3/gadget/net/ > usb0 That is weird. When I switch to host mode (using the physical switch), the whole gadget directory is removed (now testing 6.9.0-rc5) Switching back to device mode, that gadget directory is recreated. And gadget/sound as well, but not gadget/net. > s a800000.dwc3/gadget/net/usb0 < > addr_assign_type duplex phys_port_name > addr_len flags phys_switch_id > address gro_flush_timeout power > broadcast ifalias proto_down > carrier ifindex queues > carrier_changes iflink speed > carrier_down_count link_mode statistics > carrier_up_count mtu subsystem > dev_id name_assign_type tx_queue_len > dev_port netdev_group type > device operstate uevent > dormant phys_port_id waiting_for_supplier > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > BROADCAST MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:0 TX bytes:0 > > console:/sys/bus/platform/devices/a800000.ssusb # > > I strongly advise against reverting the patch solely based on the observed issue of removing the /sys/class/net/usb0 directory while the usb0 interface remains available. There's more to it. I also mentioned that switching the role or unplugging the cable leaves the usb0 connection. I have while in host mode: root@yuna:~# ifconfig -a usb0 usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 inet 10.42.0.221 netmask 255.255.255.0 broadcast 10.42.0.255 inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid 0x20<link> You don't see that because you didn't create a connection at all. > Instead, I recommend enabling FTRACE to trace the functions involved and identify which faulty call is responsible for removing usb0. Switching from device -> host -> device: root@yuna:~# trace-cmd record -p function_graph -l *gether_* plugin 'function_graph' Hit Ctrl^C to stop recording ^CCPU0 data recorded at offset=0x1c8000 188 bytes in size (4096 uncompressed) CPU1 data recorded at offset=0x1c9000 0 bytes in size (0 uncompressed) root@yuna:~# trace-cmd report cpus=2 irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # 2079.480 us | gether_disconnect(); irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + 11.640 us | gether_disconnect(); irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! 116.520 us | gether_connect(); irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # 2057.260 us | gether_disconnect(); irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! 109.000 us | gether_connect(); > According to current kernel architecture of u_ether driver, only gether_cleanup should remove the usb0 interface along with its kobject and sysfs interface. > I suggest sharing the analysis here to understand why this practice is not followed in your use case or driver ? Yes, I'll try to trace where that happens. Nevertheless, the disappearance of the net/usb0 directory seems harmless? But the usb: net device remaining after disconnect or role switch is not good, as the route remains. May be they are 2 separate problems. Could you try to reproduce what happens if you make eem connection and then unplug? > I am curious why the driver was developed without adhering to the kernel's gadget architecture. > >> >>>>> I have the dwc3 IP base usb controller, Let me check with this patch and >>>>> share result here. May be we need some fix in dwc3 >>> Would have been nice if someone could test on other controller as well. But >>> another instance of dwc3 is also very welcome. >>>> It's quite possible, please test on your side. >>>> We are happy to test any fixes if you come up with. >> >> -- >> With Best Regards, >> Andy Shevchenko >> >>
Hi, Op 25-04-2024 om 23:27 schreef Ferry Toth: > Hi, > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>> >>> ... >>> >>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>> >>>>>>>> To be precise: /sys/class/net/usb0 is not removed and it is a >>>>>>>> link, the link >>>>>>>> target >>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 no >>>>>>>> longer exists >>>>>> So, it means that the /sys/class/net/usb0 is present, but the >>>>>> symlink is >>>>>> broken. In that case, the dwc3 driver should recreate the device, >>>>>> and the >>>>>> symlink should become active again >>>> >>>> Yes, on first enabling gadget (when device mode is activated): >>>> >>>> root@yuna:~# ls >>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>> driver net power sound subsystem suspended uevent >>>> >>>> Then switching to host mode: >>>> >>>> root@yuna:~# ls >>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>> ls: cannot access >>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': No >>>> such file >>>> or directory >>>> >>>> Then back to device mode: >>>> >>>> root@yuna:~# ls >>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>> driver power sound subsystem suspended uevent >>>> >>>> net is missing. But, network functions: >>>> >>>> root@yuna:~# ping 10.42.0.1 >>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>> >>>> Mass storage device is created and removed each time as expected. >>> >>> So, what's the conclusion? Shall we move towards revert of those two >>> changes? >> >> >> As promised, I have the tested the this patch with the dwc3 gadget. I >> could not reproduce >> the issue. >> >> I can see the usb0 exist all the time and accessible regardless of the >> role switching of the USB mode (peripheral <-> host) >> >> Following are the logs: >> //Host to device >> >> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > >> mode >> console:/sys/bus/platform/devices/a800000.ssusb # ls >> a800000.dwc3/gadget/net/ >> usb0 >> >> //device to host >> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode >> console:/sys/bus/platform/devices/a800000.ssusb # ls >> a800000.dwc3/gadget/net/ >> usb0 > > That is weird. When I switch to host mode (using the physical switch), > the whole gadget directory is removed (now testing 6.9.0-rc5) > > Switching back to device mode, that gadget directory is recreated. And > gadget/sound as well, but not gadget/net. > >> s >> a800000.dwc3/gadget/net/usb0 < >> addr_assign_type duplex phys_port_name >> addr_len flags phys_switch_id >> address gro_flush_timeout power >> broadcast ifalias proto_down >> carrier ifindex queues >> carrier_changes iflink speed >> carrier_down_count link_mode statistics >> carrier_up_count mtu subsystem >> dev_id name_assign_type tx_queue_len >> dev_port netdev_group type >> device operstate uevent >> dormant phys_port_id waiting_for_supplier >> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >> BROADCAST MULTICAST MTU:1500 Metric:1 >> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:0 TX bytes:0 >> >> console:/sys/bus/platform/devices/a800000.ssusb # >> >> I strongly advise against reverting the patch solely based on the >> observed issue of removing the /sys/class/net/usb0 directory while the >> usb0 interface remains available. > > There's more to it. I also mentioned that switching the role or > unplugging the cable leaves the usb0 connection. > > I have while in host mode: > root@yuna:~# ifconfig -a usb0 > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > inet 10.42.0.221 netmask 255.255.255.0 broadcast 10.42.0.255 > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid 0x20<link> > > > You don't see that because you didn't create a connection at all. > >> Instead, I recommend enabling FTRACE to trace the functions involved >> and identify which faulty call is responsible for removing usb0. > > Switching from device -> host -> device: > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > plugin 'function_graph' > Hit Ctrl^C to stop recording > ^CCPU0 data recorded at offset=0x1c8000 > 188 bytes in size (4096 uncompressed) > CPU1 data recorded at offset=0x1c9000 > 0 bytes in size (0 uncompressed) > root@yuna:~# trace-cmd report > cpus=2 > irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > 2079.480 us | gether_disconnect(); > irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > 11.640 us | gether_disconnect(); > irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > 116.520 us | gether_connect(); > irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > 2057.260 us | gether_disconnect(); > irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > 109.000 us | gether_connect(); I tried to get a more useful trace: root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter root@yuna:/sys/kernel/tracing# echo function > current_tracer root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter -> switch to host mode then back to device root@yuna:/sys/kernel/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 53/53 #P:2 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | irq/68-dwc3-523 [000] D..3. 133.990254: reset_config <-__composite_disconnect irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable <-reset_config irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect <-reset_config kworker/1:3-443 [001] ...1. 134.022453: eem_unbind <-purge_configs_funcs -> to device mode kworker/1:3-443 [001] ...1. 148.630773: eem_bind <-usb_add_function irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt <-composite_setup irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect <-eem_set_alt irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect <-eem_set_alt irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt <-composite_setup irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect <-eem_set_alt irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect <-eem_set_alt irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete ... I don't know where to look exactly. Any hints? > >> According to current kernel architecture of u_ether driver, only >> gether_cleanup should remove the usb0 interface along with its kobject >> and sysfs interface. >> I suggest sharing the analysis here to understand why this practice is >> not followed in your use case or driver ? > > Yes, I'll try to trace where that happens. > > Nevertheless, the disappearance of the net/usb0 directory seems > harmless? But the usb: net device remaining after disconnect or role > switch is not good, as the route remains. > > May be they are 2 separate problems. Could you try to reproduce what > happens if you make eem connection and then unplug? > >> I am curious why the driver was developed without adhering to the >> kernel's gadget architecture. I don't know what you mean here. Which driver do you mean? >>> >>>>>> I have the dwc3 IP base usb controller, Let me check with this >>>>>> patch and >>>>>> share result here. May be we need some fix in dwc3 >>>> Would have been nice if someone could test on other controller as >>>> well. But >>>> another instance of dwc3 is also very welcome. >>>>> It's quite possible, please test on your side. >>>>> We are happy to test any fixes if you come up with. >>> >>> -- >>> With Best Regards, >>> Andy Shevchenko >>> >>> >
On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > Hi, > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > Hi, > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > > > > > > > ... > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > removed and it is a link, the link > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > no > > > > > > > > > longer exists > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > present, but the symlink is > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > recreate the device, and the > > > > > > > symlink should become active again > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > root@yuna:~# ls > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > Then switching to host mode: > > > > > > > > > > root@yuna:~# ls > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > ls: cannot access > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > No such file > > > > > or directory > > > > > > > > > > Then back to device mode: > > > > > > > > > > root@yuna:~# ls > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > Mass storage device is created and removed each time as expected. > > > > > > > > So, what's the conclusion? Shall we move towards revert of those > > > > two changes? > > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget. > > > I could not reproduce > > > the issue. > > > > > > I can see the usb0 exist all the time and accessible regardless of > > > the role switching of the USB mode (peripheral <-> host) > > > > > > Following are the logs: > > > //Host to device > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > > > > mode > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > a800000.dwc3/gadget/net/ > > > usb0 > > > > > > //device to host > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > a800000.dwc3/gadget/net/ > > > usb0 > > > > That is weird. When I switch to host mode (using the physical switch), > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > Switching back to device mode, that gadget directory is recreated. And > > gadget/sound as well, but not gadget/net. > > > > > s a800000.dwc3/gadget/net/usb0 > > > < > > > addr_assign_type duplex phys_port_name > > > addr_len flags phys_switch_id > > > address gro_flush_timeout power > > > broadcast ifalias proto_down > > > carrier ifindex queues > > > carrier_changes iflink speed > > > carrier_down_count link_mode statistics > > > carrier_up_count mtu subsystem > > > dev_id name_assign_type tx_queue_len > > > dev_port netdev_group type > > > device operstate uevent > > > dormant phys_port_id waiting_for_supplier > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > collisions:0 txqueuelen:1000 > > > RX bytes:0 TX bytes:0 > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > I strongly advise against reverting the patch solely based on the > > > observed issue of removing the /sys/class/net/usb0 directory while > > > the usb0 interface remains available. > > > > There's more to it. I also mentioned that switching the role or > > unplugging the cable leaves the usb0 connection. > > > > I have while in host mode: > > root@yuna:~# ifconfig -a usb0 > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > > inet 10.42.0.221 netmask 255.255.255.0 broadcast 10.42.0.255 > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid 0x20<link> > > > > > > You don't see that because you didn't create a connection at all. > > > > > Instead, I recommend enabling FTRACE to trace the functions involved > > > and identify which faulty call is responsible for removing usb0. > > > > Switching from device -> host -> device: > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > plugin 'function_graph' > > Hit Ctrl^C to stop recording > > ^CCPU0 data recorded at offset=0x1c8000 > > 188 bytes in size (4096 uncompressed) > > CPU1 data recorded at offset=0x1c9000 > > 0 bytes in size (0 uncompressed) > > root@yuna:~# trace-cmd report > > cpus=2 > > irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > > 2079.480 us | gether_disconnect(); > > irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > > 11.640 us | gether_disconnect(); > > irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > > 116.520 us | gether_connect(); > > irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > > 2057.260 us | gether_disconnect(); > > irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > > 109.000 us | gether_connect(); > > I tried to get a more useful trace: > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > root@yuna:/sys/kernel/tracing# echo function > current_tracer > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter > -> switch to host mode then back to device > root@yuna:/sys/kernel/tracing# cat trace > # tracer: function > # > # entries-in-buffer/entries-written: 53/53 #P:2 > # > # _-----=> irqs-off/BH-disabled > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / _-=> migrate-disable > # |||| / delay > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > # | | | ||||| | | > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > <-__composite_disconnect > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > <-reset_config > irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect > <-reset_config > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > <-purge_configs_funcs > > -> to device mode > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > <-usb_add_function > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > <-composite_setup > irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect > <-eem_set_alt > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > <-eem_set_alt > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > <-composite_setup > irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect > <-eem_set_alt > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > <-eem_set_alt > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete > ... > > I don't know where to look exactly. Any hints? do you see anything related to gether_cleanup() after eem_unbind() ? If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry Hardik > > > > > > According to current kernel architecture of u_ether driver, only > > > gether_cleanup should remove the usb0 interface along with its > > > kobject and sysfs interface. > > > I suggest sharing the analysis here to understand why this practice > > > is not followed in your use case or driver ? > > > > Yes, I'll try to trace where that happens. > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > harmless? But the usb: net device remaining after disconnect or role > > switch is not good, as the route remains. > > > > May be they are 2 separate problems. Could you try to reproduce what > > happens if you make eem connection and then unplug? > > > > > I am curious why the driver was developed without adhering to the > > > kernel's gadget architecture. > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > with this patch and > > > > > > > share result here. May be we need some fix in dwc3 > > > > > Would have been nice if someone could test on other > > > > > controller as well. But > > > > > another instance of dwc3 is also very welcome. > > > > > > It's quite possible, please test on your side. > > > > > > We are happy to test any fixes if you come up with. > > > > > > > > -- > > > > With Best Regards, > > > > Andy Shevchenko > > > > > > > > > >
Hi, Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >> Hi, >> >> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>> Hi, >>> >>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>> >>>>> ... >>>>> >>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>> >>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>> removed and it is a link, the link >>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>> no >>>>>>>>>> longer exists >>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>> present, but the symlink is >>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>> recreate the device, and the >>>>>>>> symlink should become active again >>>>>> >>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>> >>>>>> root@yuna:~# ls >>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>> driver net power sound subsystem suspended uevent >>>>>> >>>>>> Then switching to host mode: >>>>>> >>>>>> root@yuna:~# ls >>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>> ls: cannot access >>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>> No such file >>>>>> or directory >>>>>> >>>>>> Then back to device mode: >>>>>> >>>>>> root@yuna:~# ls >>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>> driver power sound subsystem suspended uevent >>>>>> >>>>>> net is missing. But, network functions: >>>>>> >>>>>> root@yuna:~# ping 10.42.0.1 >>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>> >>>>>> Mass storage device is created and removed each time as expected. >>>>> >>>>> So, what's the conclusion? Shall we move towards revert of those >>>>> two changes? >>>> >>>> >>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>> I could not reproduce >>>> the issue. >>>> >>>> I can see the usb0 exist all the time and accessible regardless of >>>> the role switching of the USB mode (peripheral <-> host) >>>> >>>> Following are the logs: >>>> //Host to device >>>> >>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>> mode >>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>> a800000.dwc3/gadget/net/ >>>> usb0 >>>> >>>> //device to host >>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode >>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>> a800000.dwc3/gadget/net/ >>>> usb0 >>> >>> That is weird. When I switch to host mode (using the physical switch), >>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>> >>> Switching back to device mode, that gadget directory is recreated. And >>> gadget/sound as well, but not gadget/net. >>> >>>> s a800000.dwc3/gadget/net/usb0 >>>> < >>>> addr_assign_type duplex phys_port_name >>>> addr_len flags phys_switch_id >>>> address gro_flush_timeout power >>>> broadcast ifalias proto_down >>>> carrier ifindex queues >>>> carrier_changes iflink speed >>>> carrier_down_count link_mode statistics >>>> carrier_up_count mtu subsystem >>>> dev_id name_assign_type tx_queue_len >>>> dev_port netdev_group type >>>> device operstate uevent >>>> dormant phys_port_id waiting_for_supplier >>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>> collisions:0 txqueuelen:1000 >>>> RX bytes:0 TX bytes:0 >>>> >>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>> >>>> I strongly advise against reverting the patch solely based on the >>>> observed issue of removing the /sys/class/net/usb0 directory while >>>> the usb0 interface remains available. >>> >>> There's more to it. I also mentioned that switching the role or >>> unplugging the cable leaves the usb0 connection. >>> >>> I have while in host mode: >>> root@yuna:~# ifconfig -a usb0 >>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>> inet 10.42.0.221 netmask 255.255.255.0 broadcast 10.42.0.255 >>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid 0x20<link> >>> >>> >>> You don't see that because you didn't create a connection at all. >>> >>>> Instead, I recommend enabling FTRACE to trace the functions involved >>>> and identify which faulty call is responsible for removing usb0. >>> >>> Switching from device -> host -> device: >>> >>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>> plugin 'function_graph' >>> Hit Ctrl^C to stop recording >>> ^CCPU0 data recorded at offset=0x1c8000 >>> 188 bytes in size (4096 uncompressed) >>> CPU1 data recorded at offset=0x1c9000 >>> 0 bytes in size (0 uncompressed) >>> root@yuna:~# trace-cmd report >>> cpus=2 >>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # >>> 2079.480 us | gether_disconnect(); >>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + >>> 11.640 us | gether_disconnect(); >>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! >>> 116.520 us | gether_connect(); >>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # >>> 2057.260 us | gether_disconnect(); >>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! >>> 109.000 us | gether_connect(); >> >> I tried to get a more useful trace: >> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >> root@yuna:/sys/kernel/tracing# echo function > current_tracer >> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter >> -> switch to host mode then back to device >> root@yuna:/sys/kernel/tracing# cat trace >> # tracer: function >> # >> # entries-in-buffer/entries-written: 53/53 #P:2 >> # >> # _-----=> irqs-off/BH-disabled >> # / _----=> need-resched >> # | / _---=> hardirq/softirq >> # || / _--=> preempt-depth >> # ||| / _-=> migrate-disable >> # |||| / delay >> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >> # | | | ||||| | | >> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >> <-__composite_disconnect >> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >> <-reset_config >> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect >> <-reset_config >> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >> <-purge_configs_funcs >> >> -> to device mode >> >> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >> <-usb_add_function >> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >> <-composite_setup >> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect >> <-eem_set_alt >> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >> <-eem_set_alt >> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >> <-composite_setup >> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect >> <-eem_set_alt >> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >> <-eem_set_alt >> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap <-rx_complete >> ... >> >> I don't know where to look exactly. Any hints? > > do you see anything related to gether_cleanup() after eem_unbind() ? Nope. It's a pitty that the trace formatting got messed up above. But as you can see I traced gether_* and eem_*. After eem_unbind no traced function is called, until I flip the switch to device mode. The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete and eem_wrap <-eth_start_xmit lines. > If not then, you may try to enable tracing of TCP/IP stack and network side to check who deleting the sysfs entry Yes, that's a vast amount of functions to trace. And I don't see how that would be related to the patch we're discussing here. I was hoping for a little more targeted hint. You may recall the whole issue did not occur before this patch got applied. > Hardik > > >> >>> >>>> According to current kernel architecture of u_ether driver, only >>>> gether_cleanup should remove the usb0 interface along with its >>>> kobject and sysfs interface. >>>> I suggest sharing the analysis here to understand why this practice >>>> is not followed in your use case or driver ? >>> >>> Yes, I'll try to trace where that happens. >>> >>> Nevertheless, the disappearance of the net/usb0 directory seems >>> harmless? But the usb: net device remaining after disconnect or role >>> switch is not good, as the route remains. >>> >>> May be they are 2 separate problems. Could you try to reproduce what >>> happens if you make eem connection and then unplug? >>> >>>> I am curious why the driver was developed without adhering to the >>>> kernel's gadget architecture. >> >> I don't know what you mean here. Which driver do you mean? >> >>>>> >>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>> with this patch and >>>>>>>> share result here. May be we need some fix in dwc3 >>>>>> Would have been nice if someone could test on other >>>>>> controller as well. But >>>>>> another instance of dwc3 is also very welcome. >>>>>>> It's quite possible, please test on your side. >>>>>>> We are happy to test any fixes if you come up with. >>>>> >>>>> -- >>>>> With Best Regards, >>>>> Andy Shevchenko >>>>> >>>>> >>>
Hi, Op 30-04-2024 om 21:40 schreef Ferry Toth: > Hi, > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>> Hi, >>> >>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>> Hi, >>>> >>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>> >>>>>> ... >>>>>> >>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>> >>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>> target >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>> no >>>>>>>>>>> longer exists >>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>> present, but the symlink is >>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>> recreate the device, and the >>>>>>>>> symlink should become active again >>>>>>> >>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>> >>>>>>> root@yuna:~# ls >>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>> driver net power sound subsystem suspended uevent >>>>>>> >>>>>>> Then switching to host mode: >>>>>>> >>>>>>> root@yuna:~# ls >>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>> ls: cannot access >>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>> No such file >>>>>>> or directory >>>>>>> >>>>>>> Then back to device mode: >>>>>>> >>>>>>> root@yuna:~# ls >>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>> driver power sound subsystem suspended uevent >>>>>>> >>>>>>> net is missing. But, network functions: >>>>>>> >>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>> >>>>>>> Mass storage device is created and removed each time as expected. >>>>>> >>>>>> So, what's the conclusion? Shall we move towards revert of those >>>>>> two changes? >>>>> >>>>> >>>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>>> I could not reproduce >>>>> the issue. >>>>> >>>>> I can see the usb0 exist all the time and accessible regardless of >>>>> the role switching of the USB mode (peripheral <-> host) >>>>> >>>>> Following are the logs: >>>>> //Host to device >>>>> >>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>>> mode >>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>> a800000.dwc3/gadget/net/ >>>>> usb0 >>>>> >>>>> //device to host >>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode >>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>> a800000.dwc3/gadget/net/ >>>>> usb0 >>>> >>>> That is weird. When I switch to host mode (using the physical switch), >>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>> >>>> Switching back to device mode, that gadget directory is recreated. And >>>> gadget/sound as well, but not gadget/net. >>>> >>>>> s a800000.dwc3/gadget/net/usb0 >>>>> < >>>>> addr_assign_type duplex phys_port_name >>>>> addr_len flags phys_switch_id >>>>> address gro_flush_timeout power >>>>> broadcast ifalias proto_down >>>>> carrier ifindex queues >>>>> carrier_changes iflink speed >>>>> carrier_down_count link_mode statistics >>>>> carrier_up_count mtu subsystem >>>>> dev_id name_assign_type tx_queue_len >>>>> dev_port netdev_group type >>>>> device operstate uevent >>>>> dormant phys_port_id waiting_for_supplier >>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>> collisions:0 txqueuelen:1000 >>>>> RX bytes:0 TX bytes:0 >>>>> >>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>> >>>>> I strongly advise against reverting the patch solely based on the >>>>> observed issue of removing the /sys/class/net/usb0 directory while >>>>> the usb0 interface remains available. >>>> >>>> There's more to it. I also mentioned that switching the role or >>>> unplugging the cable leaves the usb0 connection. >>>> >>>> I have while in host mode: >>>> root@yuna:~# ifconfig -a usb0 >>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>> 10.42.0.255 >>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 scopeid >>>> 0x20<link> >>>> >>>> >>>> You don't see that because you didn't create a connection at all. >>>> >>>>> Instead, I recommend enabling FTRACE to trace the functions involved >>>>> and identify which faulty call is responsible for removing usb0. >>>> >>>> Switching from device -> host -> device: >>>> >>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>> plugin 'function_graph' >>>> Hit Ctrl^C to stop recording >>>> ^CCPU0 data recorded at offset=0x1c8000 >>>> 188 bytes in size (4096 uncompressed) >>>> CPU1 data recorded at offset=0x1c9000 >>>> 0 bytes in size (0 uncompressed) >>>> root@yuna:~# trace-cmd report >>>> cpus=2 >>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # >>>> 2079.480 us | gether_disconnect(); >>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + >>>> 11.640 us | gether_disconnect(); >>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! >>>> 116.520 us | gether_connect(); >>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # >>>> 2057.260 us | gether_disconnect(); >>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! >>>> 109.000 us | gether_connect(); >>> >>> I tried to get a more useful trace: >>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter >>> -> switch to host mode then back to device >>> root@yuna:/sys/kernel/tracing# cat trace >>> # tracer: function >>> # >>> # entries-in-buffer/entries-written: 53/53 #P:2 >>> # >>> # _-----=> irqs-off/BH-disabled >>> # / _----=> need-resched >>> # | / _---=> hardirq/softirq >>> # || / _--=> preempt-depth >>> # ||| / _-=> migrate-disable >>> # |||| / delay >>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>> # | | | ||||| | | >>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>> <-__composite_disconnect >>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>> <-reset_config >>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect >>> <-reset_config >>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>> <-purge_configs_funcs >>> >>> -> to device mode >>> >>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>> <-usb_add_function >>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>> <-composite_setup >>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect >>> <-eem_set_alt >>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >>> <-eem_set_alt >>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>> <-composite_setup >>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect >>> <-eem_set_alt >>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >>> <-eem_set_alt >>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>> <-rx_complete >>> ... >>> >>> I don't know where to look exactly. Any hints? >> >> do you see anything related to gether_cleanup() after eem_unbind() ? > > Nope. It's a pitty that the trace formatting got messed up above. But as > you can see I traced gether_* and eem_*. After eem_unbind no traced > function is called, until I flip the switch to device mode. > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete > and eem_wrap <-eth_start_xmit lines. > >> If not then, you may try to enable tracing of TCP/IP stack and network >> side to check who deleting the sysfs entry > > Yes, that's a vast amount of functions to trace. And I don't see how > that would be related to the patch we're discussing here. I was hoping > for a little more targeted hint. Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', 'reset_config' and 'device_remove_file' leads me to: TIMESTAMP FUNCTION | | 49.952477: eem_wrap <-eth_start_xmit 55.072455: eem_wrap <-eth_start_xmit 55.072621: eem_unwrap <-rx_complete 59.011540: configfs_composite_reset <-usb_gadget_udc_reset 59.011545: composite_reset <-configfs_composite_reset 59.011548: reset_config <-__composite_disconnect 59.013565: eem_disable <-reset_config 59.013567: gether_disconnect <-reset_config 59.049560: device_remove_file <-device_remove 59.051185: configfs_composite_disconnect <-usb_gadget_disco 59.051189: composite_disconnect <-configfs_composite_discon 59.051195: configfs_composite_unbind <-gadget_unbind_driver 59.052519: eem_unbind <-purge_configs_funcs 59.052529: composite_dev_cleanup <-configfs_composite_unbin 59.052537: device_remove_file <-composite_dev_cleanup device_remove_file gets called twice, once by device_remove after gether_disconnect (that the one). The 2nd time by composite_dev_cleanup (removing the gadget) > You may recall the whole issue did not occur before this patch got applied. > >> Hardik >> >> >>> >>>> >>>>> According to current kernel architecture of u_ether driver, only >>>>> gether_cleanup should remove the usb0 interface along with its >>>>> kobject and sysfs interface. >>>>> I suggest sharing the analysis here to understand why this practice >>>>> is not followed in your use case or driver ? >>>> >>>> Yes, I'll try to trace where that happens. >>>> >>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>> harmless? But the usb: net device remaining after disconnect or role >>>> switch is not good, as the route remains. >>>> >>>> May be they are 2 separate problems. Could you try to reproduce what >>>> happens if you make eem connection and then unplug? >>>> >>>>> I am curious why the driver was developed without adhering to the >>>>> kernel's gadget architecture. >>> >>> I don't know what you mean here. Which driver do you mean? >>> >>>>>> >>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>> with this patch and >>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>> Would have been nice if someone could test on other >>>>>>> controller as well. But >>>>>>> another instance of dwc3 is also very welcome. >>>>>>>> It's quite possible, please test on your side. >>>>>>>> We are happy to test any fixes if you come up with. >>>>>> >>>>>> -- >>>>>> With Best Regards, >>>>>> Andy Shevchenko >>>>>> >>>>>> >>>> >
On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > Hi, > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > Hi, > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > Hi, > > > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > Hi, > > > > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > no > > > > > > > > > > > > longer exists > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > present, but the symlink is > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > recreate the device, and the > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > ls: cannot access > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > No such file > > > > > > > > or directory > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > Mass storage device is created and removed each time as expected. > > > > > > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those > > > > > > > two changes? > > > > > > > > > > > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget. > > > > > > I could not reproduce > > > > > > the issue. > > > > > > > > > > > > I can see the usb0 exist all the time and accessible regardless of > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > Following are the logs: > > > > > > //Host to device > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > > > > > > > mode > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > a800000.dwc3/gadget/net/ > > > > > > usb0 > > > > > > > > > > > > //device to host > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > a800000.dwc3/gadget/net/ > > > > > > usb0 > > > > > > > > > > That is weird. When I switch to host mode (using the physical switch), > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > Switching back to device mode, that gadget directory is recreated. And > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > < > > > > > > addr_assign_type duplex phys_port_name > > > > > > addr_len flags phys_switch_id > > > > > > address gro_flush_timeout power > > > > > > broadcast ifalias proto_down > > > > > > carrier ifindex queues > > > > > > carrier_changes iflink speed > > > > > > carrier_down_count link_mode statistics > > > > > > carrier_up_count mtu subsystem > > > > > > dev_id name_assign_type tx_queue_len > > > > > > dev_port netdev_group type > > > > > > device operstate uevent > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > > > > collisions:0 txqueuelen:1000 > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > I strongly advise against reverting the patch solely based on the > > > > > > observed issue of removing the /sys/class/net/usb0 directory while > > > > > > the usb0 interface remains available. > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > I have while in host mode: > > > > > root@yuna:~# ifconfig -a usb0 > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > 10.42.0.255 > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > plugin 'function_graph' > > > > > Hit Ctrl^C to stop recording > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > 188 bytes in size (4096 uncompressed) > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > 0 bytes in size (0 uncompressed) > > > > > root@yuna:~# trace-cmd report > > > > > cpus=2 > > > > > irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > > > > > 2079.480 us | gether_disconnect(); > > > > > irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > > > > > 11.640 us | gether_disconnect(); > > > > > irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > > > > > 116.520 us | gether_connect(); > > > > > irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > > > > > 2057.260 us | gether_disconnect(); > > > > > irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > > > > > 109.000 us | gether_connect(); > > > > > > > > I tried to get a more useful trace: > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter > > > > -> switch to host mode then back to device > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > # tracer: function > > > > # > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > # > > > > # _-----=> irqs-off/BH-disabled > > > > # / _----=> need-resched > > > > # | / _---=> hardirq/softirq > > > > # || / _--=> preempt-depth > > > > # ||| / _-=> migrate-disable > > > > # |||| / delay > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > # | | | ||||| | | > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > <-__composite_disconnect > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > <-reset_config > > > > irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect > > > > <-reset_config > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > <-purge_configs_funcs > > > > > > > > -> to device mode > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > <-usb_add_function > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > <-composite_setup > > > > irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect > > > > <-eem_set_alt > > > > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > > > > <-eem_set_alt > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > <-composite_setup > > > > irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect > > > > <-eem_set_alt > > > > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > > > > <-eem_set_alt > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > <-rx_complete > > > > ... > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > do you see anything related to gether_cleanup() after eem_unbind() ? > > > > Nope. It's a pitty that the trace formatting got messed up above. But as > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > function is called, until I flip the switch to device mode. > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete > > and eem_wrap <-eth_start_xmit lines. > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > network side to check who deleting the sysfs entry > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > that would be related to the patch we're discussing here. I was hoping > > for a little more targeted hint. > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', > 'reset_config' and 'device_remove_file' leads me to: > > TIMESTAMP FUNCTION > | | > 49.952477: eem_wrap <-eth_start_xmit > 55.072455: eem_wrap <-eth_start_xmit > 55.072621: eem_unwrap <-rx_complete > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > 59.011545: composite_reset <-configfs_composite_reset > 59.011548: reset_config <-__composite_disconnect > 59.013565: eem_disable <-reset_config > 59.013567: gether_disconnect <-reset_config > 59.049560: device_remove_file <-device_remove > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > 59.051189: composite_disconnect <-configfs_composite_discon > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > 59.052519: eem_unbind <-purge_configs_funcs > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > 59.052537: device_remove_file <-composite_dev_cleanup > > device_remove_file gets called twice, once by device_remove after > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup > (removing the gadget) I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget. Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget? I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support. Additionally, please check if you have any customized DWC patches that may be causing this problem. > > > You may recall the whole issue did not occur before this patch got applied. > > > > > Hardik > > > > > > > > > > > > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > kobject and sysfs interface. > > > > > > I suggest sharing the analysis here to understand why this practice > > > > > > is not followed in your use case or driver ? > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > harmless? But the usb: net device remaining after disconnect or role > > > > > switch is not good, as the route remains. > > > > > > > > > > May be they are 2 separate problems. Could you try to reproduce what > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > I am curious why the driver was developed without adhering to the > > > > > > kernel's gadget architecture. > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > with this patch and > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > Would have been nice if someone could test on other > > > > > > > > controller as well. But > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > We are happy to test any fixes if you come up with. > > > > > > > > > > > > > > -- > > > > > > > With Best Regards, > > > > > > > Andy Shevchenko > > > > > > > > > > > > > > > > > > > > >
On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote: > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: ... > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > > no > > > > > > > > > > > > > longer exists > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > > present, but the symlink is > > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > > recreate the device, and the > > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > ls: cannot access > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > > No such file > > > > > > > > > or directory > > > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > > > Mass storage device is created and removed each time as expected. > > > > > > > > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those > > > > > > > > two changes? > > > > > > > > > > > > > > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget. > > > > > > > I could not reproduce > > > > > > > the issue. > > > > > > > > > > > > > > I can see the usb0 exist all the time and accessible regardless of > > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > > > Following are the logs: > > > > > > > //Host to device > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > > > > > > > > mode > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > usb0 > > > > > > > > > > > > > > //device to host > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > usb0 > > > > > > > > > > > > That is weird. When I switch to host mode (using the physical switch), > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > > > Switching back to device mode, that gadget directory is recreated. And > > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > > < > > > > > > > addr_assign_type duplex phys_port_name > > > > > > > addr_len flags phys_switch_id > > > > > > > address gro_flush_timeout power > > > > > > > broadcast ifalias proto_down > > > > > > > carrier ifindex queues > > > > > > > carrier_changes iflink speed > > > > > > > carrier_down_count link_mode statistics > > > > > > > carrier_up_count mtu subsystem > > > > > > > dev_id name_assign_type tx_queue_len > > > > > > > dev_port netdev_group type > > > > > > > device operstate uevent > > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > > > > > collisions:0 txqueuelen:1000 > > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > > > I strongly advise against reverting the patch solely based on the > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while > > > > > > > the usb0 interface remains available. > > > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > > > I have while in host mode: > > > > > > root@yuna:~# ifconfig -a usb0 > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > > 10.42.0.255 > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved > > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > > plugin 'function_graph' > > > > > > Hit Ctrl^C to stop recording > > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > > 188 bytes in size (4096 uncompressed) > > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > > 0 bytes in size (0 uncompressed) > > > > > > root@yuna:~# trace-cmd report > > > > > > cpus=2 > > > > > > irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > > > > > > 2079.480 us | gether_disconnect(); > > > > > > irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > > > > > > 11.640 us | gether_disconnect(); > > > > > > irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > > > > > > 116.520 us | gether_connect(); > > > > > > irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > > > > > > 2057.260 us | gether_disconnect(); > > > > > > irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > > > > > > 109.000 us | gether_connect(); > > > > > > > > > > I tried to get a more useful trace: > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter > > > > > -> switch to host mode then back to device > > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > > # tracer: function > > > > > # > > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > > # > > > > > # _-----=> irqs-off/BH-disabled > > > > > # / _----=> need-resched > > > > > # | / _---=> hardirq/softirq > > > > > # || / _--=> preempt-depth > > > > > # ||| / _-=> migrate-disable > > > > > # |||| / delay > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > > # | | | ||||| | | > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > > <-__composite_disconnect > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > > <-reset_config > > > > > irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect > > > > > <-reset_config > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > > <-purge_configs_funcs > > > > > > > > > > -> to device mode > > > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > > <-usb_add_function > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > > <-composite_setup > > > > > irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > > <-composite_setup > > > > > irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > > > > > <-eem_set_alt > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > > <-rx_complete > > > > > ... > > > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > > > do you see anything related to gether_cleanup() after eem_unbind() ? > > > > > > Nope. It's a pitty that the trace formatting got messed up above. But as > > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > > function is called, until I flip the switch to device mode. > > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete > > > and eem_wrap <-eth_start_xmit lines. > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > > network side to check who deleting the sysfs entry > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > > that would be related to the patch we're discussing here. I was hoping > > > for a little more targeted hint. > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', > > 'reset_config' and 'device_remove_file' leads me to: > > > > TIMESTAMP FUNCTION > > | | > > 49.952477: eem_wrap <-eth_start_xmit > > 55.072455: eem_wrap <-eth_start_xmit > > 55.072621: eem_unwrap <-rx_complete > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > > 59.011545: composite_reset <-configfs_composite_reset > > 59.011548: reset_config <-__composite_disconnect > > 59.013565: eem_disable <-reset_config > > 59.013567: gether_disconnect <-reset_config > > 59.049560: device_remove_file <-device_remove > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > > 59.051189: composite_disconnect <-configfs_composite_discon > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > > 59.052519: eem_unbind <-purge_configs_funcs > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > > 59.052537: device_remove_file <-composite_dev_cleanup > > > > device_remove_file gets called twice, once by device_remove after > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup > > (removing the gadget) > > I believe that the device_remove_file function is only removing > suspend-specific attributes, not the complete gadget. Typically, when you > perform the role switch, the Gadget start/stop function in your UDC driver is > called. These functions should not delete the gadget > > To investigate further, could you please enable the DWC3 functions in ftrace > and check who is removing the gadget? I can also enable this on my system > and compare the logs with yours, but I will be in PI planning for 1.5 weeks > and may not be able to provide immediate support. Since we are almost at -rc7, I propose to revert and try again next cycle. > Additionally, please check if you have any customized DWC patches that may be causing this problem. > > > > You may recall the whole issue did not occur before this patch got applied. > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > > kobject and sysfs interface. > > > > > > > I suggest sharing the analysis here to understand why this practice > > > > > > > is not followed in your use case or driver ? > > > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > > harmless? But the usb: net device remaining after disconnect or role > > > > > > switch is not good, as the route remains. > > > > > > > > > > > > May be they are 2 separate problems. Could you try to reproduce what > > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > > > I am curious why the driver was developed without adhering to the > > > > > > > kernel's gadget architecture. > > > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > > with this patch and > > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > > Would have been nice if someone could test on other > > > > > > > > > controller as well. But > > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > > We are happy to test any fixes if you come up with.
On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote: > On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote: > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > > > > > > > > > On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > ... > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > > > no > > > > > > > > > > > > > > longer exists > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > > > present, but the symlink is > > > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > > > recreate the device, and the > > > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > ls: cannot access > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > > > No such file > > > > > > > > > > or directory > > > > > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > > > > > Mass storage device is created and removed each time as expected. > > > > > > > > > > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those > > > > > > > > > two changes? > > > > > > > > > > > > > > > > > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget. > > > > > > > > I could not reproduce > > > > > > > > the issue. > > > > > > > > > > > > > > > > I can see the usb0 exist all the time and accessible regardless of > > > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > > > > > Following are the logs: > > > > > > > > //Host to device > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > > > > > > > > > mode > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > usb0 > > > > > > > > > > > > > > > > //device to host > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > usb0 > > > > > > > > > > > > > > That is weird. When I switch to host mode (using the physical switch), > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > > > > > Switching back to device mode, that gadget directory is recreated. And > > > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > > > < > > > > > > > > addr_assign_type duplex phys_port_name > > > > > > > > addr_len flags phys_switch_id > > > > > > > > address gro_flush_timeout power > > > > > > > > broadcast ifalias proto_down > > > > > > > > carrier ifindex queues > > > > > > > > carrier_changes iflink speed > > > > > > > > carrier_down_count link_mode statistics > > > > > > > > carrier_up_count mtu subsystem > > > > > > > > dev_id name_assign_type tx_queue_len > > > > > > > > dev_port netdev_group type > > > > > > > > device operstate uevent > > > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > > > > > > collisions:0 txqueuelen:1000 > > > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > > > > > I strongly advise against reverting the patch solely based on the > > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while > > > > > > > > the usb0 interface remains available. > > > > > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > > > > > I have while in host mode: > > > > > > > root@yuna:~# ifconfig -a usb0 > > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > > > 10.42.0.255 > > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > > > > > Instead, I recommend enabling FTRACE to trace the functions involved > > > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > > > plugin 'function_graph' > > > > > > > Hit Ctrl^C to stop recording > > > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > > > 188 bytes in size (4096 uncompressed) > > > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > > > 0 bytes in size (0 uncompressed) > > > > > > > root@yuna:~# trace-cmd report > > > > > > > cpus=2 > > > > > > > irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > > > > > > > 2079.480 us | gether_disconnect(); > > > > > > > irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > > > > > > > 11.640 us | gether_disconnect(); > > > > > > > irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > > > > > > > 116.520 us | gether_connect(); > > > > > > > irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > > > > > > > 2057.260 us | gether_disconnect(); > > > > > > > irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > > > > > > > 109.000 us | gether_connect(); > > > > > > > > > > > > I tried to get a more useful trace: > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter > > > > > > -> switch to host mode then back to device > > > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > > > # tracer: function > > > > > > # > > > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > > > # > > > > > > # _-----=> irqs-off/BH-disabled > > > > > > # / _----=> need-resched > > > > > > # | / _---=> hardirq/softirq > > > > > > # || / _--=> preempt-depth > > > > > > # ||| / _-=> migrate-disable > > > > > > # |||| / delay > > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > > > # | | | ||||| | | > > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > > > <-__composite_disconnect > > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > > > <-reset_config > > > > > > irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect > > > > > > <-reset_config > > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > > > <-purge_configs_funcs > > > > > > > > > > > > -> to device mode > > > > > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > > > <-usb_add_function > > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > > > <-composite_setup > > > > > > irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect > > > > > > <-eem_set_alt > > > > > > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > > > > > > <-eem_set_alt > > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > > > <-composite_setup > > > > > > irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect > > > > > > <-eem_set_alt > > > > > > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > > > > > > <-eem_set_alt > > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > > > <-rx_complete > > > > > > ... > > > > > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > > > > > do you see anything related to gether_cleanup() after eem_unbind() ? > > > > > > > > Nope. It's a pitty that the trace formatting got messed up above. But as > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > > > function is called, until I flip the switch to device mode. > > > > The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete > > > > and eem_wrap <-eth_start_xmit lines. > > > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > > > network side to check who deleting the sysfs entry > > > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > > > that would be related to the patch we're discussing here. I was hoping > > > > for a little more targeted hint. > > > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', > > > 'reset_config' and 'device_remove_file' leads me to: > > > > > > TIMESTAMP FUNCTION > > > | | > > > 49.952477: eem_wrap <-eth_start_xmit > > > 55.072455: eem_wrap <-eth_start_xmit > > > 55.072621: eem_unwrap <-rx_complete > > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > > > 59.011545: composite_reset <-configfs_composite_reset > > > 59.011548: reset_config <-__composite_disconnect > > > 59.013565: eem_disable <-reset_config > > > 59.013567: gether_disconnect <-reset_config > > > 59.049560: device_remove_file <-device_remove > > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > > > 59.051189: composite_disconnect <-configfs_composite_discon > > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > > > 59.052519: eem_unbind <-purge_configs_funcs > > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > > > 59.052537: device_remove_file <-composite_dev_cleanup > > > > > > device_remove_file gets called twice, once by device_remove after > > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup > > > (removing the gadget) > > > > I believe that the device_remove_file function is only removing > > suspend-specific attributes, not the complete gadget. Typically, when you > > perform the role switch, the Gadget start/stop function in your UDC driver is > > called. These functions should not delete the gadget > > > > To investigate further, could you please enable the DWC3 functions in ftrace > > and check who is removing the gadget? I can also enable this on my system > > and compare the logs with yours, but I will be in PI planning for 1.5 weeks > > and may not be able to provide immediate support. > > Since we are almost at -rc7, I propose to revert and try again next cycle. Why? There is no problem with this patch! I don't know why you are so excited to revert the patch instead of investigating the original issue. Even though I have proved that the problem is caused by usb0 being removed just by role-switching the port by the UDC driver, this behavior is incorrect. There is no behavior in the Linux kernel that keeps the network interface and removes the related sysfs entry. THIS IS WRONG and has been FIXED without reverting any mainline patch. Please don't encourage reverting any mainstream patches to avoid investigating a (probably) faulty driver. This is not how the community should work. Also, I'm a bit wondering: the patch has been in the mainline for some time, and we haven't had any issues except this one, which is due to the wrong behavior of the UDC driver. > > > Additionally, please check if you have any customized DWC patches that may be causing this problem. > > > > > > You may recall the whole issue did not occur before this patch got applied. > > > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > > > kobject and sysfs interface. > > > > > > > > I suggest sharing the analysis here to understand why this practice > > > > > > > > is not followed in your use case or driver ? > > > > > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > > > harmless? But the usb: net device remaining after disconnect or role > > > > > > > switch is not good, as the route remains. > > > > > > > > > > > > > > May be they are 2 separate problems. Could you try to reproduce what > > > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > > > > > I am curious why the driver was developed without adhering to the > > > > > > > > kernel's gadget architecture. > > > > > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > > > with this patch and > > > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > > > Would have been nice if someone could test on other > > > > > > > > > > controller as well. But > > > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > > > We are happy to test any fixes if you come up with. > > -- > With Best Regards, > Andy Shevchenko > >
Op 02-05-2024 om 17:29 schreef Hardik Gajjar: > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >> Hi, >> >> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>> Hi, >>> >>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>> Hi, >>>>> >>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>> Hi, >>>>>> >>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>> >>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>> no >>>>>>>>>>>>> longer exists >>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>> present, but the symlink is >>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>> recreate the device, and the >>>>>>>>>>> symlink should become active again >>>>>>>>> >>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>> >>>>>>>>> root@yuna:~# ls >>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>> >>>>>>>>> Then switching to host mode: >>>>>>>>> >>>>>>>>> root@yuna:~# ls >>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>> ls: cannot access >>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>> No such file >>>>>>>>> or directory >>>>>>>>> >>>>>>>>> Then back to device mode: >>>>>>>>> >>>>>>>>> root@yuna:~# ls >>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>> >>>>>>>>> net is missing. But, network functions: >>>>>>>>> >>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>> >>>>>>>>> Mass storage device is created and removed each time as expected. >>>>>>>> >>>>>>>> So, what's the conclusion? Shall we move towards revert of those >>>>>>>> two changes? >>>>>>> >>>>>>> >>>>>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>>>>> I could not reproduce >>>>>>> the issue. >>>>>>> >>>>>>> I can see the usb0 exist all the time and accessible regardless of >>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>> >>>>>>> Following are the logs: >>>>>>> //Host to device >>>>>>> >>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>>>>> mode >>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>> a800000.dwc3/gadget/net/ >>>>>>> usb0 >>>>>>> >>>>>>> //device to host >>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode >>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>> a800000.dwc3/gadget/net/ >>>>>>> usb0 >>>>>> >>>>>> That is weird. When I switch to host mode (using the physical switch), >>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>> >>>>>> Switching back to device mode, that gadget directory is recreated. And >>>>>> gadget/sound as well, but not gadget/net. >>>>>> >>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>> < >>>>>>> addr_assign_type duplex phys_port_name >>>>>>> addr_len flags phys_switch_id >>>>>>> address gro_flush_timeout power >>>>>>> broadcast ifalias proto_down >>>>>>> carrier ifindex queues >>>>>>> carrier_changes iflink speed >>>>>>> carrier_down_count link_mode statistics >>>>>>> carrier_up_count mtu subsystem >>>>>>> dev_id name_assign_type tx_queue_len >>>>>>> dev_port netdev_group type >>>>>>> device operstate uevent >>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>>> collisions:0 txqueuelen:1000 >>>>>>> RX bytes:0 TX bytes:0 >>>>>>> >>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>> >>>>>>> I strongly advise against reverting the patch solely based on the >>>>>>> observed issue of removing the /sys/class/net/usb0 directory while >>>>>>> the usb0 interface remains available. >>>>>> >>>>>> There's more to it. I also mentioned that switching the role or >>>>>> unplugging the cable leaves the usb0 connection. >>>>>> >>>>>> I have while in host mode: >>>>>> root@yuna:~# ifconfig -a usb0 >>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>> 10.42.0.255 >>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>> scopeid 0x20<link> >>>>>> >>>>>> >>>>>> You don't see that because you didn't create a connection at all. >>>>>> >>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved >>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>> >>>>>> Switching from device -> host -> device: >>>>>> >>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>> plugin 'function_graph' >>>>>> Hit Ctrl^C to stop recording >>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>> 188 bytes in size (4096 uncompressed) >>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>> 0 bytes in size (0 uncompressed) >>>>>> root@yuna:~# trace-cmd report >>>>>> cpus=2 >>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # >>>>>> 2079.480 us | gether_disconnect(); >>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + >>>>>> 11.640 us | gether_disconnect(); >>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! >>>>>> 116.520 us | gether_connect(); >>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # >>>>>> 2057.260 us | gether_disconnect(); >>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! >>>>>> 109.000 us | gether_connect(); >>>>> >>>>> I tried to get a more useful trace: >>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter >>>>> -> switch to host mode then back to device >>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>> # tracer: function >>>>> # >>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>> # >>>>> # _-----=> irqs-off/BH-disabled >>>>> # / _----=> need-resched >>>>> # | / _---=> hardirq/softirq >>>>> # || / _--=> preempt-depth >>>>> # ||| / _-=> migrate-disable >>>>> # |||| / delay >>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>> # | | | ||||| | | >>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>> <-__composite_disconnect >>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>> <-reset_config >>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect >>>>> <-reset_config >>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>> <-purge_configs_funcs >>>>> >>>>> -> to device mode >>>>> >>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>> <-usb_add_function >>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>> <-composite_setup >>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect >>>>> <-eem_set_alt >>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >>>>> <-eem_set_alt >>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>> <-composite_setup >>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect >>>>> <-eem_set_alt >>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >>>>> <-eem_set_alt >>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>> <-rx_complete >>>>> ... >>>>> >>>>> I don't know where to look exactly. Any hints? >>>> >>>> do you see anything related to gether_cleanup() after eem_unbind() ? >>> >>> Nope. It's a pitty that the trace formatting got messed up above. But as >>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>> function is called, until I flip the switch to device mode. >>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete >>> and eem_wrap <-eth_start_xmit lines. >>> >>>> If not then, you may try to enable tracing of TCP/IP stack and >>>> network side to check who deleting the sysfs entry >>> >>> Yes, that's a vast amount of functions to trace. And I don't see how >>> that would be related to the patch we're discussing here. I was hoping >>> for a little more targeted hint. >> >> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', >> 'reset_config' and 'device_remove_file' leads me to: >> >> TIMESTAMP FUNCTION >> | | >> 49.952477: eem_wrap <-eth_start_xmit >> 55.072455: eem_wrap <-eth_start_xmit >> 55.072621: eem_unwrap <-rx_complete >> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >> 59.011545: composite_reset <-configfs_composite_reset >> 59.011548: reset_config <-__composite_disconnect >> 59.013565: eem_disable <-reset_config >> 59.013567: gether_disconnect <-reset_config >> 59.049560: device_remove_file <-device_remove >> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >> 59.051189: composite_disconnect <-configfs_composite_discon >> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >> 59.052519: eem_unbind <-purge_configs_funcs >> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >> 59.052537: device_remove_file <-composite_dev_cleanup >> >> device_remove_file gets called twice, once by device_remove after >> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup >> (removing the gadget) > > I believe that the device_remove_file function is only removing suspend-specific attributes, not the complete gadget. > Typically, when you perform the role switch, the Gadget start/stop function in your UDC driver is called. These functions should not delete the gadget > > To investigate further, could you please enable the DWC3 functions in ftrace and check who is removing the gadget? > I can also enable this on my system and compare the logs with yours, but I will be in PI planning for 1.5 weeks and may not be able to provide immediate support. Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so trace is 600kb. I cut uninteresting stuff before configfs_composite_reset <-usb_gadget_udc_reset and after __dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to compare. > Additionally, please check if you have any customized DWC patches that may be causing this problem. > >> >>> You may recall the whole issue did not occur before this patch got applied. >>> >>>> Hardik >>>> >>>> >>>>> >>>>>> >>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>> kobject and sysfs interface. >>>>>>> I suggest sharing the analysis here to understand why this practice >>>>>>> is not followed in your use case or driver ? >>>>>> >>>>>> Yes, I'll try to trace where that happens. >>>>>> >>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>> harmless? But the usb: net device remaining after disconnect or role >>>>>> switch is not good, as the route remains. >>>>>> >>>>>> May be they are 2 separate problems. Could you try to reproduce what >>>>>> happens if you make eem connection and then unplug? >>>>>> >>>>>>> I am curious why the driver was developed without adhering to the >>>>>>> kernel's gadget architecture. >>>>> >>>>> I don't know what you mean here. Which driver do you mean? >>>>> >>>>>>>> >>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>> with this patch and >>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>> Would have been nice if someone could test on other >>>>>>>>> controller as well. But >>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>> We are happy to test any fixes if you come up with. >>>>>>>> >>>>>>>> -- >>>>>>>> With Best Regards, >>>>>>>> Andy Shevchenko >>>>>>>> >>>>>>>> >>>>>> >>>
Oops, sorry, wrong file attached . Now correct one. Op 02-05-2024 om 22:13 schreef Ferry Toth: > Op 02-05-2024 om 17:29 schreef Hardik Gajjar: >> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>> Hi, >>> >>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>> Hi, >>>> >>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>> Hi, >>>>>> >>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>> Hi, >>>>>>> >>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>>>>> >>>>>>>>> ... >>>>>>>>> >>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>> >>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>> target >>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>> no >>>>>>>>>>>>>> longer exists >>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>> symlink should become active again >>>>>>>>>> >>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>>> >>>>>>>>>> root@yuna:~# ls >>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>> >>>>>>>>>> Then switching to host mode: >>>>>>>>>> >>>>>>>>>> root@yuna:~# ls >>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>> ls: cannot access >>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>> No such file >>>>>>>>>> or directory >>>>>>>>>> >>>>>>>>>> Then back to device mode: >>>>>>>>>> >>>>>>>>>> root@yuna:~# ls >>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>> >>>>>>>>>> net is missing. But, network functions: >>>>>>>>>> >>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>> >>>>>>>>>> Mass storage device is created and removed each time as expected. >>>>>>>>> >>>>>>>>> So, what's the conclusion? Shall we move towards revert of those >>>>>>>>> two changes? >>>>>>>> >>>>>>>> >>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>>>>>> I could not reproduce >>>>>>>> the issue. >>>>>>>> >>>>>>>> I can see the usb0 exist all the time and accessible regardless of >>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>> >>>>>>>> Following are the logs: >>>>>>>> //Host to device >>>>>>>> >>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>>>>>> mode >>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>> usb0 >>>>>>>> >>>>>>>> //device to host >>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > >>>>>>>> mode >>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>> usb0 >>>>>>> >>>>>>> That is weird. When I switch to host mode (using the physical >>>>>>> switch), >>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>> >>>>>>> Switching back to device mode, that gadget directory is >>>>>>> recreated. And >>>>>>> gadget/sound as well, but not gadget/net. >>>>>>> >>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>> < >>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>> addr_len flags phys_switch_id >>>>>>>> address gro_flush_timeout power >>>>>>>> broadcast ifalias proto_down >>>>>>>> carrier ifindex queues >>>>>>>> carrier_changes iflink speed >>>>>>>> carrier_down_count link_mode statistics >>>>>>>> carrier_up_count mtu subsystem >>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>> dev_port netdev_group type >>>>>>>> device operstate uevent >>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>> >>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>> >>>>>>>> I strongly advise against reverting the patch solely based on the >>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while >>>>>>>> the usb0 interface remains available. >>>>>>> >>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>> >>>>>>> I have while in host mode: >>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>> 10.42.0.255 >>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>> scopeid 0x20<link> >>>>>>> >>>>>>> >>>>>>> You don't see that because you didn't create a connection at all. >>>>>>> >>>>>>>> Instead, I recommend enabling FTRACE to trace the functions >>>>>>>> involved >>>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>>> >>>>>>> Switching from device -> host -> device: >>>>>>> >>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>> plugin 'function_graph' >>>>>>> Hit Ctrl^C to stop recording >>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>> 0 bytes in size (0 uncompressed) >>>>>>> root@yuna:~# trace-cmd report >>>>>>> cpus=2 >>>>>>> irq/68-dwc3-725 [000] 514.575337: >>>>>>> funcgraph_entry: # >>>>>>> 2079.480 us | gether_disconnect(); >>>>>>> irq/68-dwc3-946 [000] 524.263731: >>>>>>> funcgraph_entry: + >>>>>>> 11.640 us | gether_disconnect(); >>>>>>> irq/68-dwc3-946 [000] 524.263743: >>>>>>> funcgraph_entry: ! >>>>>>> 116.520 us | gether_connect(); >>>>>>> irq/68-dwc3-946 [000] 524.268029: >>>>>>> funcgraph_entry: # >>>>>>> 2057.260 us | gether_disconnect(); >>>>>>> irq/68-dwc3-946 [000] 524.270089: >>>>>>> funcgraph_entry: ! >>>>>>> 109.000 us | gether_connect(); >>>>>> >>>>>> I tried to get a more useful trace: >>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> >>>>>> set_ftrace_filter >>>>>> -> switch to host mode then back to device >>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>> # tracer: function >>>>>> # >>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>> # >>>>>> # _-----=> irqs-off/BH-disabled >>>>>> # / _----=> need-resched >>>>>> # | / _---=> hardirq/softirq >>>>>> # || / _--=> preempt-depth >>>>>> # ||| / _-=> migrate-disable >>>>>> # |||| / delay >>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>> # | | | ||||| | | >>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>>> <-__composite_disconnect >>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>> <-reset_config >>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: >>>>>> gether_disconnect >>>>>> <-reset_config >>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>> <-purge_configs_funcs >>>>>> >>>>>> -> to device mode >>>>>> >>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>> <-usb_add_function >>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>> <-composite_setup >>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: >>>>>> gether_disconnect >>>>>> <-eem_set_alt >>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >>>>>> <-eem_set_alt >>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>> <-composite_setup >>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: >>>>>> gether_disconnect >>>>>> <-eem_set_alt >>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >>>>>> <-eem_set_alt >>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>> <-rx_complete >>>>>> ... >>>>>> >>>>>> I don't know where to look exactly. Any hints? >>>>> >>>>> do you see anything related to gether_cleanup() after eem_unbind() ? >>>> >>>> Nope. It's a pitty that the trace formatting got messed up above. >>>> But as >>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>> function is called, until I flip the switch to device mode. >>>> The ... at the end is where I cut uninteresting eem_unwrap >>>> <-rx_complete >>>> and eem_wrap <-eth_start_xmit lines. >>>> >>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>> network side to check who deleting the sysfs entry >>>> >>>> Yes, that's a vast amount of functions to trace. And I don't see how >>>> that would be related to the patch we're discussing here. I was hoping >>>> for a little more targeted hint. >>> >>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', >>> 'usb_fun*', >>> 'reset_config' and 'device_remove_file' leads me to: >>> >>> TIMESTAMP FUNCTION >>> | | >>> 49.952477: eem_wrap <-eth_start_xmit >>> 55.072455: eem_wrap <-eth_start_xmit >>> 55.072621: eem_unwrap <-rx_complete >>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>> 59.011545: composite_reset <-configfs_composite_reset >>> 59.011548: reset_config <-__composite_disconnect >>> 59.013565: eem_disable <-reset_config >>> 59.013567: gether_disconnect <-reset_config >>> 59.049560: device_remove_file <-device_remove >>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>> 59.051189: composite_disconnect <-configfs_composite_discon >>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>> 59.052519: eem_unbind <-purge_configs_funcs >>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>> 59.052537: device_remove_file <-composite_dev_cleanup >>> >>> device_remove_file gets called twice, once by device_remove after >>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup >>> (removing the gadget) >> >> I believe that the device_remove_file function is only removing >> suspend-specific attributes, not the complete gadget. >> Typically, when you perform the role switch, the Gadget start/stop >> function in your UDC driver is called. These functions should not >> delete the gadget >> >> To investigate further, could you please enable the DWC3 functions in >> ftrace and check who is removing the gadget? >> I can also enable this on my system and compare the logs with yours, >> but I will be in PI planning for 1.5 weeks and may not be able to >> provide immediate support. > > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so > trace is 600kb. I cut uninteresting stuff before > configfs_composite_reset <-usb_gadget_udc_reset and after > __dwc3_set_mode, < 300 lines remain. See attached tar.gz for you to > compare. > >> Additionally, please check if you have any customized DWC patches that >> may be causing this problem. >> >>> >>>> You may recall the whole issue did not occur before this patch got >>>> applied. >>>> >>>>> Hardik >>>>> >>>>> >>>>>> >>>>>>> >>>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>> kobject and sysfs interface. >>>>>>>> I suggest sharing the analysis here to understand why this practice >>>>>>>> is not followed in your use case or driver ? >>>>>>> >>>>>>> Yes, I'll try to trace where that happens. >>>>>>> >>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>> harmless? But the usb: net device remaining after disconnect or role >>>>>>> switch is not good, as the route remains. >>>>>>> >>>>>>> May be they are 2 separate problems. Could you try to reproduce what >>>>>>> happens if you make eem connection and then unplug? >>>>>>> >>>>>>>> I am curious why the driver was developed without adhering to the >>>>>>>> kernel's gadget architecture. >>>>>> >>>>>> I don't know what you mean here. Which driver do you mean? >>>>>> >>>>>>>>> >>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>> with this patch and >>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>> controller as well. But >>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>> We are happy to test any fixes if you come up with. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> With Best Regards, >>>>>>>>> Andy Shevchenko >>>>>>>>> >>>>>>>>> >>>>>>> >>>>
Greg, not totally sure, but this might be something where you might need to make a judgement call. See below: On 02.05.24 18:16, Hardik Gajjar wrote: > On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote: >> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote: >>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >> >> ... >> >>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>>> no >>>>>>>>>>>>>>> longer exists >>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>>> symlink should become active again >>>>>>>>>>> >>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>>>> >>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>>> >>>>>>>>>>> Then switching to host mode: >>>>>>>>>>> >>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>> ls: cannot access >>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>>> No such file >>>>>>>>>>> or directory >>>>>>>>>>> >>>>>>>>>>> Then back to device mode: >>>>>>>>>>> >>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>>> >>>>>>>>>>> net is missing. But, network functions: >>>>>>>>>>> >>>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>>> >>>>>>>>>>> Mass storage device is created and removed each time as expected. >>>>>>>>>> >>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those >>>>>>>>>> two changes? >>>>>>>>> >>>>>>>>> >>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>>>>>>> I could not reproduce >>>>>>>>> the issue. >>>>>>>>> >>>>>>>>> I can see the usb0 exist all the time and accessible regardless of >>>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>>> >>>>>>>>> Following are the logs: >>>>>>>>> //Host to device >>>>>>>>> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>>>>>>> mode >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>> usb0 >>>>>>>>> >>>>>>>>> //device to host >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>> usb0 >>>>>>>> >>>>>>>> That is weird. When I switch to host mode (using the physical switch), >>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>>> >>>>>>>> Switching back to device mode, that gadget directory is recreated. And >>>>>>>> gadget/sound as well, but not gadget/net. >>>>>>>> >>>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>>> < >>>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>>> addr_len flags phys_switch_id >>>>>>>>> address gro_flush_timeout power >>>>>>>>> broadcast ifalias proto_down >>>>>>>>> carrier ifindex queues >>>>>>>>> carrier_changes iflink speed >>>>>>>>> carrier_down_count link_mode statistics >>>>>>>>> carrier_up_count mtu subsystem >>>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>>> dev_port netdev_group type >>>>>>>>> device operstate uevent >>>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>>> >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>>> >>>>>>>>> I strongly advise against reverting the patch solely based on the >>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while >>>>>>>>> the usb0 interface remains available. >>>>>>>> >>>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>>> >>>>>>>> I have while in host mode: >>>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>>> 10.42.0.255 >>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>>> scopeid 0x20<link> >>>>>>>> >>>>>>>> >>>>>>>> You don't see that because you didn't create a connection at all. >>>>>>>> >>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved >>>>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>>>> >>>>>>>> Switching from device -> host -> device: >>>>>>>> >>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>>> plugin 'function_graph' >>>>>>>> Hit Ctrl^C to stop recording >>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>>> 0 bytes in size (0 uncompressed) >>>>>>>> root@yuna:~# trace-cmd report >>>>>>>> cpus=2 >>>>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # >>>>>>>> 2079.480 us | gether_disconnect(); >>>>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + >>>>>>>> 11.640 us | gether_disconnect(); >>>>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! >>>>>>>> 116.520 us | gether_connect(); >>>>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # >>>>>>>> 2057.260 us | gether_disconnect(); >>>>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! >>>>>>>> 109.000 us | gether_connect(); >>>>>>> >>>>>>> I tried to get a more useful trace: >>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter >>>>>>> -> switch to host mode then back to device >>>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>>> # tracer: function >>>>>>> # >>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>>> # >>>>>>> # _-----=> irqs-off/BH-disabled >>>>>>> # / _----=> need-resched >>>>>>> # | / _---=> hardirq/softirq >>>>>>> # || / _--=> preempt-depth >>>>>>> # ||| / _-=> migrate-disable >>>>>>> # |||| / delay >>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>>> # | | | ||||| | | >>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>>>> <-__composite_disconnect >>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>>> <-reset_config >>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect >>>>>>> <-reset_config >>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>>> <-purge_configs_funcs >>>>>>> >>>>>>> -> to device mode >>>>>>> >>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>>> <-usb_add_function >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>>> <-composite_setup >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect >>>>>>> <-eem_set_alt >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >>>>>>> <-eem_set_alt >>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>>> <-composite_setup >>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect >>>>>>> <-eem_set_alt >>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >>>>>>> <-eem_set_alt >>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>>> <-rx_complete >>>>>>> ... >>>>>>> >>>>>>> I don't know where to look exactly. Any hints? >>>>>> >>>>>> do you see anything related to gether_cleanup() after eem_unbind() ? >>>>> >>>>> Nope. It's a pitty that the trace formatting got messed up above. But as >>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>>> function is called, until I flip the switch to device mode. >>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete >>>>> and eem_wrap <-eth_start_xmit lines. >>>>> >>>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>>> network side to check who deleting the sysfs entry >>>>> >>>>> Yes, that's a vast amount of functions to trace. And I don't see how >>>>> that would be related to the patch we're discussing here. I was hoping >>>>> for a little more targeted hint. >>>> >>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', >>>> 'reset_config' and 'device_remove_file' leads me to: >>>> >>>> TIMESTAMP FUNCTION >>>> | | >>>> 49.952477: eem_wrap <-eth_start_xmit >>>> 55.072455: eem_wrap <-eth_start_xmit >>>> 55.072621: eem_unwrap <-rx_complete >>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>>> 59.011545: composite_reset <-configfs_composite_reset >>>> 59.011548: reset_config <-__composite_disconnect >>>> 59.013565: eem_disable <-reset_config >>>> 59.013567: gether_disconnect <-reset_config >>>> 59.049560: device_remove_file <-device_remove >>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>>> 59.051189: composite_disconnect <-configfs_composite_discon >>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>>> 59.052519: eem_unbind <-purge_configs_funcs >>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>>> 59.052537: device_remove_file <-composite_dev_cleanup >>>> >>>> device_remove_file gets called twice, once by device_remove after >>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup >>>> (removing the gadget) >>> >>> I believe that the device_remove_file function is only removing >>> suspend-specific attributes, not the complete gadget. Typically, when you >>> perform the role switch, the Gadget start/stop function in your UDC driver is >>> called. These functions should not delete the gadget >>> >>> To investigate further, could you please enable the DWC3 functions in ftrace >>> and check who is removing the gadget? I can also enable this on my system >>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks >>> and may not be able to provide immediate support. >> >> Since we are almost at -rc7, I propose to revert and try again next cycle. > > Why? There is no problem with this patch! > > I don't know why you are so excited to revert the patch instead of > investigating the original issue. Even though I have proved that the > problem is caused by usb0 being removed just by role-switching the > port by the UDC driver, this behavior is incorrect. > > There is no behavior in the Linux kernel that keeps the network > interface and removes the related sysfs entry. THIS IS WRONG and has > been FIXED without reverting any mainline patch. > > Please don't encourage reverting any mainstream patches to avoid > investigating a (probably) faulty driver. This is not how the > community should work. I only skimmed the thread and this is not my area of expertise, so correct me if I'm wrong anywhere here: But from what I see I tend to disagree: the patch mentioned in the subject apparently regressed something. This afaics was reported 16 weeks ago during the 6.7 cycle[1]. Due to how we apply the "no regressions" rule[2] the change causing this ideally should have been reverted before 6.8 was release -- it does not matter that it just exposed an existing problem rooted somewhere else. That revert did not happen. Then even a few more weeks went by and this is still not fixed. Then I'd guess reverting this might be the right course of action to motivate someone to fix the exposed problem. Unless of course we know or fear that a revert causes regressions for users of 6.8.y -- is that the case? [1] https://lore.kernel.org/all/ZaQS5x-XK08Jre6I@smile.fi.intel.com/ [2] https://docs.kernel.org/process/handling-regressions.html Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. > Also, I'm a bit wondering: the patch has been in the mainline for > some time, and we haven't had any issues except this one, which is > due to the wrong behavior of the UDC driver. >> >>> Additionally, please check if you have any customized DWC patches that may be causing this problem. >>> >>>>> You may recall the whole issue did not occur before this patch got applied. >>>>> >>>>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>>> kobject and sysfs interface. >>>>>>>>> I suggest sharing the analysis here to understand why this practice >>>>>>>>> is not followed in your use case or driver ? >>>>>>>> >>>>>>>> Yes, I'll try to trace where that happens. >>>>>>>> >>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>>> harmless? But the usb: net device remaining after disconnect or role >>>>>>>> switch is not good, as the route remains. >>>>>>>> >>>>>>>> May be they are 2 separate problems. Could you try to reproduce what >>>>>>>> happens if you make eem connection and then unplug? >>>>>>>> >>>>>>>>> I am curious why the driver was developed without adhering to the >>>>>>>>> kernel's gadget architecture. >>>>>>> >>>>>>> I don't know what you mean here. Which driver do you mean? >>>>>>> >>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>>> with this patch and >>>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>>> controller as well. But >>>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>>> We are happy to test any fixes if you come up with. >> >> -- >> With Best Regards, >> Andy Shevchenko >> >>
On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > Greg, not totally sure, but this might be something where you might need > to make a judgement call. See below: > > On 02.05.24 18:16, Hardik Gajjar wrote: > > On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote: > >> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote: > >>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > >>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: > >>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > >>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > >>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: > >>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > >>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > >>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > >>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > >>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > >>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: > >>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > >>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > >> > > >> ... > >> > >>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not > >>>>>>>>>>>>>>> removed and it is a link, the link > >>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > >>>>>>>>>>>>>>> no > >>>>>>>>>>>>>>> longer exists > >>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is > >>>>>>>>>>>>> present, but the symlink is > >>>>>>>>>>>>> broken. In that case, the dwc3 driver should > >>>>>>>>>>>>> recreate the device, and the > >>>>>>>>>>>>> symlink should become active again > >>>>>>>>>>> > >>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): > >>>>>>>>>>> > >>>>>>>>>>> root@yuna:~# ls > >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > >>>>>>>>>>> driver net power sound subsystem suspended uevent > >>>>>>>>>>> > >>>>>>>>>>> Then switching to host mode: > >>>>>>>>>>> > >>>>>>>>>>> root@yuna:~# ls > >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > >>>>>>>>>>> ls: cannot access > >>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > >>>>>>>>>>> No such file > >>>>>>>>>>> or directory > >>>>>>>>>>> > >>>>>>>>>>> Then back to device mode: > >>>>>>>>>>> > >>>>>>>>>>> root@yuna:~# ls > >>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > >>>>>>>>>>> driver power sound subsystem suspended uevent > >>>>>>>>>>> > >>>>>>>>>>> net is missing. But, network functions: > >>>>>>>>>>> > >>>>>>>>>>> root@yuna:~# ping 10.42.0.1 > >>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes > >>>>>>>>>>> > >>>>>>>>>>> Mass storage device is created and removed each time as expected. > >>>>>>>>>> > >>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those > >>>>>>>>>> two changes? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget. > >>>>>>>>> I could not reproduce > >>>>>>>>> the issue. > >>>>>>>>> > >>>>>>>>> I can see the usb0 exist all the time and accessible regardless of > >>>>>>>>> the role switching of the USB mode (peripheral <-> host) > >>>>>>>>> > >>>>>>>>> Following are the logs: > >>>>>>>>> //Host to device > >>>>>>>>> > >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > >>>>>>>>>> mode > >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls > >>>>>>>>> a800000.dwc3/gadget/net/ > >>>>>>>>> usb0 > >>>>>>>>> > >>>>>>>>> //device to host > >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode > >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls > >>>>>>>>> a800000.dwc3/gadget/net/ > >>>>>>>>> usb0 > >>>>>>>> > >>>>>>>> That is weird. When I switch to host mode (using the physical switch), > >>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) > >>>>>>>> > >>>>>>>> Switching back to device mode, that gadget directory is recreated. And > >>>>>>>> gadget/sound as well, but not gadget/net. > >>>>>>>> > >>>>>>>>> s a800000.dwc3/gadget/net/usb0 > >>>>>>>>> < > >>>>>>>>> addr_assign_type duplex phys_port_name > >>>>>>>>> addr_len flags phys_switch_id > >>>>>>>>> address gro_flush_timeout power > >>>>>>>>> broadcast ifalias proto_down > >>>>>>>>> carrier ifindex queues > >>>>>>>>> carrier_changes iflink speed > >>>>>>>>> carrier_down_count link_mode statistics > >>>>>>>>> carrier_up_count mtu subsystem > >>>>>>>>> dev_id name_assign_type tx_queue_len > >>>>>>>>> dev_port netdev_group type > >>>>>>>>> device operstate uevent > >>>>>>>>> dormant phys_port_id waiting_for_supplier > >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > >>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > >>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 > >>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > >>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > >>>>>>>>> collisions:0 txqueuelen:1000 > >>>>>>>>> RX bytes:0 TX bytes:0 > >>>>>>>>> > >>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # > >>>>>>>>> > >>>>>>>>> I strongly advise against reverting the patch solely based on the > >>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while > >>>>>>>>> the usb0 interface remains available. > >>>>>>>> > >>>>>>>> There's more to it. I also mentioned that switching the role or > >>>>>>>> unplugging the cable leaves the usb0 connection. > >>>>>>>> > >>>>>>>> I have while in host mode: > >>>>>>>> root@yuna:~# ifconfig -a usb0 > >>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > >>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast > >>>>>>>> 10.42.0.255 > >>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > >>>>>>>> scopeid 0x20<link> > >>>>>>>> > >>>>>>>> > >>>>>>>> You don't see that because you didn't create a connection at all. > >>>>>>>> > >>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved > >>>>>>>>> and identify which faulty call is responsible for removing usb0. > >>>>>>>> > >>>>>>>> Switching from device -> host -> device: > >>>>>>>> > >>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* > >>>>>>>> plugin 'function_graph' > >>>>>>>> Hit Ctrl^C to stop recording > >>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 > >>>>>>>> 188 bytes in size (4096 uncompressed) > >>>>>>>> CPU1 data recorded at offset=0x1c9000 > >>>>>>>> 0 bytes in size (0 uncompressed) > >>>>>>>> root@yuna:~# trace-cmd report > >>>>>>>> cpus=2 > >>>>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # > >>>>>>>> 2079.480 us | gether_disconnect(); > >>>>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + > >>>>>>>> 11.640 us | gether_disconnect(); > >>>>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! > >>>>>>>> 116.520 us | gether_connect(); > >>>>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # > >>>>>>>> 2057.260 us | gether_disconnect(); > >>>>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! > >>>>>>>> 109.000 us | gether_connect(); > >>>>>>> > >>>>>>> I tried to get a more useful trace: > >>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > >>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > >>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer > >>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter > >>>>>>> -> switch to host mode then back to device > >>>>>>> root@yuna:/sys/kernel/tracing# cat trace > >>>>>>> # tracer: function > >>>>>>> # > >>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 > >>>>>>> # > >>>>>>> # _-----=> irqs-off/BH-disabled > >>>>>>> # / _----=> need-resched > >>>>>>> # | / _---=> hardirq/softirq > >>>>>>> # || / _--=> preempt-depth > >>>>>>> # ||| / _-=> migrate-disable > >>>>>>> # |||| / delay > >>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > >>>>>>> # | | | ||||| | | > >>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > >>>>>>> <-__composite_disconnect > >>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > >>>>>>> <-reset_config > >>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect > >>>>>>> <-reset_config > >>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > >>>>>>> <-purge_configs_funcs > >>>>>>> > >>>>>>> -> to device mode > >>>>>>> > >>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind > >>>>>>> <-usb_add_function > >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > >>>>>>> <-composite_setup > >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect > >>>>>>> <-eem_set_alt > >>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > >>>>>>> <-eem_set_alt > >>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > >>>>>>> <-composite_setup > >>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect > >>>>>>> <-eem_set_alt > >>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > >>>>>>> <-eem_set_alt > >>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > >>>>>>> <-rx_complete > >>>>>>> ... > >>>>>>> > >>>>>>> I don't know where to look exactly. Any hints? > >>>>>> > >>>>>> do you see anything related to gether_cleanup() after eem_unbind() ? > >>>>> > >>>>> Nope. It's a pitty that the trace formatting got messed up above. But as > >>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced > >>>>> function is called, until I flip the switch to device mode. > >>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete > >>>>> and eem_wrap <-eth_start_xmit lines. > >>>>> > >>>>>> If not then, you may try to enable tracing of TCP/IP stack and > >>>>>> network side to check who deleting the sysfs entry > >>>>> > >>>>> Yes, that's a vast amount of functions to trace. And I don't see how > >>>>> that would be related to the patch we're discussing here. I was hoping > >>>>> for a little more targeted hint. > >>>> > >>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', > >>>> 'reset_config' and 'device_remove_file' leads me to: > >>>> > >>>> TIMESTAMP FUNCTION > >>>> | | > >>>> 49.952477: eem_wrap <-eth_start_xmit > >>>> 55.072455: eem_wrap <-eth_start_xmit > >>>> 55.072621: eem_unwrap <-rx_complete > >>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > >>>> 59.011545: composite_reset <-configfs_composite_reset > >>>> 59.011548: reset_config <-__composite_disconnect > >>>> 59.013565: eem_disable <-reset_config > >>>> 59.013567: gether_disconnect <-reset_config > >>>> 59.049560: device_remove_file <-device_remove > >>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco > >>>> 59.051189: composite_disconnect <-configfs_composite_discon > >>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver > >>>> 59.052519: eem_unbind <-purge_configs_funcs > >>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin > >>>> 59.052537: device_remove_file <-composite_dev_cleanup > >>>> > >>>> device_remove_file gets called twice, once by device_remove after > >>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup > >>>> (removing the gadget) > >>> > >>> I believe that the device_remove_file function is only removing > >>> suspend-specific attributes, not the complete gadget. Typically, when you > >>> perform the role switch, the Gadget start/stop function in your UDC driver is > >>> called. These functions should not delete the gadget > >>> > >>> To investigate further, could you please enable the DWC3 functions in ftrace > >>> and check who is removing the gadget? I can also enable this on my system > >>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks > >>> and may not be able to provide immediate support. > >> > >> Since we are almost at -rc7, I propose to revert and try again next cycle. > > > > Why? There is no problem with this patch! > > > > I don't know why you are so excited to revert the patch instead of > > investigating the original issue. Even though I have proved that the > > problem is caused by usb0 being removed just by role-switching the > > port by the UDC driver, this behavior is incorrect. > > > > There is no behavior in the Linux kernel that keeps the network > > interface and removes the related sysfs entry. THIS IS WRONG and has > > been FIXED without reverting any mainline patch. > > > > Please don't encourage reverting any mainstream patches to avoid > > investigating a (probably) faulty driver. This is not how the > > community should work. > > I only skimmed the thread and this is not my area of expertise, so > correct me if I'm wrong anywhere here: > > But from what I see I tend to disagree: the patch mentioned in the > subject apparently regressed something. This afaics was reported 16 > weeks ago during the 6.7 cycle[1]. Due to how we apply the "no > regressions" rule[2] the change causing this ideally should have been > reverted before 6.8 was release -- it does not matter that it just > exposed an existing problem rooted somewhere else. That revert did not > happen. Then even a few more weeks went by and this is still not fixed. > Then I'd guess reverting this might be the right course of action to > motivate someone to fix the exposed problem. Unless of course we know or > fear that a revert causes regressions for users of 6.8.y -- is that the > case? > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e= > > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e= > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e= > If I did something stupid, please tell me, as explained on that page. I understand your perspective. Do you have documentation or guidelines available to ensure that the existing problem is resolved and the correct patch is reinstated? As far as I know, the issue is currently only reproducible by the reporter. I also have a similar setup, but I have not been able to reproduce the issue myself (please refer to our previous discussions). This discrepancy could be due to some OEM modifications. If the problem is only reproducible by the reporter, how do we motivate others to support and upstream the fix? Isn't it demotivating to revert the correct patch and retain the wrong behavior of the driver? Additionally, is it possible to determine how many correct patches have been reverted in the past to address this particular problem? Please correct me if I am wrong ? > > > Also, I'm a bit wondering: the patch has been in the mainline for > > some time, and we haven't had any issues except this one, which is > > due to the wrong behavior of the UDC driver. > >> > >>> Additionally, please check if you have any customized DWC patches that may be causing this problem. > >>> > >>>>> You may recall the whole issue did not occur before this patch got applied. > >>>>> > >>>>>>>>> According to current kernel architecture of u_ether driver, only > >>>>>>>>> gether_cleanup should remove the usb0 interface along with its > >>>>>>>>> kobject and sysfs interface. > >>>>>>>>> I suggest sharing the analysis here to understand why this practice > >>>>>>>>> is not followed in your use case or driver ? > >>>>>>>> > >>>>>>>> Yes, I'll try to trace where that happens. > >>>>>>>> > >>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems > >>>>>>>> harmless? But the usb: net device remaining after disconnect or role > >>>>>>>> switch is not good, as the route remains. > >>>>>>>> > >>>>>>>> May be they are 2 separate problems. Could you try to reproduce what > >>>>>>>> happens if you make eem connection and then unplug? > >>>>>>>> > >>>>>>>>> I am curious why the driver was developed without adhering to the > >>>>>>>>> kernel's gadget architecture. > >>>>>>> > >>>>>>> I don't know what you mean here. Which driver do you mean? > >>>>>>> > >>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check > >>>>>>>>>>>>> with this patch and > >>>>>>>>>>>>> share result here. May be we need some fix in dwc3 > >>>>>>>>>>> Would have been nice if someone could test on other > >>>>>>>>>>> controller as well. But > >>>>>>>>>>> another instance of dwc3 is also very welcome. > >>>>>>>>>>>> It's quite possible, please test on your side. > >>>>>>>>>>>> We are happy to test any fixes if you come up with. > >> > >> -- > >> With Best Regards, > >> Andy Shevchenko > >> > >>
On 03.05.24 11:15, Hardik Gajjar wrote: > On Fri, May 03, 2024 at 09:24:16AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote: >> Greg, not totally sure, but this might be something where you might need >> to make a judgement call. See below: >> >> On 02.05.24 18:16, Hardik Gajjar wrote: >>> On Thu, May 02, 2024 at 06:31:48PM +0300, Andy Shevchenko wrote: >>>> On Thu, May 02, 2024 at 05:29:16PM +0200, Hardik Gajjar wrote: >>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at 08:37:42PM +0300, Andy Shevchenko wrote: >>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>> >> >>>> ... >>>> >>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>> longer exists >>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>>>>> symlink should become active again >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>>>>> >>>>>>>>>>>>> Then switching to host mode: >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>> ls: cannot access >>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>>>>> No such file >>>>>>>>>>>>> or directory >>>>>>>>>>>>> >>>>>>>>>>>>> Then back to device mode: >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>>>>> >>>>>>>>>>>>> net is missing. But, network functions: >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>>>>> >>>>>>>>>>>>> Mass storage device is created and removed each time as expected. >>>>>>>>>>>> >>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those >>>>>>>>>>>> two changes? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>>>>>>>>> I could not reproduce >>>>>>>>>>> the issue. >>>>>>>>>>> >>>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of >>>>>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>>>>> >>>>>>>>>>> Following are the logs: >>>>>>>>>>> //Host to device >>>>>>>>>>> >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>>>>>>>>> mode >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>> usb0 >>>>>>>>>>> >>>>>>>>>>> //device to host >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "host" > mode >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>> usb0 >>>>>>>>>> >>>>>>>>>> That is weird. When I switch to host mode (using the physical switch), >>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>>>>> >>>>>>>>>> Switching back to device mode, that gadget directory is recreated. And >>>>>>>>>> gadget/sound as well, but not gadget/net. >>>>>>>>>> >>>>>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>>>>> < >>>>>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>>>>> addr_len flags phys_switch_id >>>>>>>>>>> address gro_flush_timeout power >>>>>>>>>>> broadcast ifalias proto_down >>>>>>>>>>> carrier ifindex queues >>>>>>>>>>> carrier_changes iflink speed >>>>>>>>>>> carrier_down_count link_mode statistics >>>>>>>>>>> carrier_up_count mtu subsystem >>>>>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>>>>> dev_port netdev_group type >>>>>>>>>>> device operstate uevent >>>>>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>>>>> >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>>>>> >>>>>>>>>>> I strongly advise against reverting the patch solely based on the >>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while >>>>>>>>>>> the usb0 interface remains available. >>>>>>>>>> >>>>>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>>>>> >>>>>>>>>> I have while in host mode: >>>>>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>>>>> 10.42.0.255 >>>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>>>>> scopeid 0x20<link> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You don't see that because you didn't create a connection at all. >>>>>>>>>> >>>>>>>>>>> Instead, I recommend enabling FTRACE to trace the functions involved >>>>>>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>>>>>> >>>>>>>>>> Switching from device -> host -> device: >>>>>>>>>> >>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>>>>> plugin 'function_graph' >>>>>>>>>> Hit Ctrl^C to stop recording >>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>>>>> 0 bytes in size (0 uncompressed) >>>>>>>>>> root@yuna:~# trace-cmd report >>>>>>>>>> cpus=2 >>>>>>>>>> irq/68-dwc3-725 [000] 514.575337: funcgraph_entry: # >>>>>>>>>> 2079.480 us | gether_disconnect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.263731: funcgraph_entry: + >>>>>>>>>> 11.640 us | gether_disconnect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.263743: funcgraph_entry: ! >>>>>>>>>> 116.520 us | gether_connect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.268029: funcgraph_entry: # >>>>>>>>>> 2057.260 us | gether_disconnect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.270089: funcgraph_entry: ! >>>>>>>>>> 109.000 us | gether_connect(); >>>>>>>>> >>>>>>>>> I tried to get a more useful trace: >>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >> set_ftrace_filter >>>>>>>>> -> switch to host mode then back to device >>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>>>>> # tracer: function >>>>>>>>> # >>>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>>>>> # >>>>>>>>> # _-----=> irqs-off/BH-disabled >>>>>>>>> # / _----=> need-resched >>>>>>>>> # | / _---=> hardirq/softirq >>>>>>>>> # || / _--=> preempt-depth >>>>>>>>> # ||| / _-=> migrate-disable >>>>>>>>> # |||| / delay >>>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>>>>> # | | | ||||| | | >>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>>>>>> <-__composite_disconnect >>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>>>>> <-reset_config >>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: gether_disconnect >>>>>>>>> <-reset_config >>>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>>>>> <-purge_configs_funcs >>>>>>>>> >>>>>>>>> -> to device mode >>>>>>>>> >>>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>>>>> <-usb_add_function >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>>>>> <-composite_setup >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: gether_disconnect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>>>>> <-composite_setup >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: gether_disconnect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>>>>> <-rx_complete >>>>>>>>> ... >>>>>>>>> >>>>>>>>> I don't know where to look exactly. Any hints? >>>>>>>> >>>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ? >>>>>>> >>>>>>> Nope. It's a pitty that the trace formatting got messed up above. But as >>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>>>>> function is called, until I flip the switch to device mode. >>>>>>> The ... at the end is where I cut uninteresting eem_unwrap <-rx_complete >>>>>>> and eem_wrap <-eth_start_xmit lines. >>>>>>> >>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>>>>> network side to check who deleting the sysfs entry >>>>>>> >>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how >>>>>>> that would be related to the patch we're discussing here. I was hoping >>>>>>> for a little more targeted hint. >>>>>> >>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', 'usb_fun*', >>>>>> 'reset_config' and 'device_remove_file' leads me to: >>>>>> >>>>>> TIMESTAMP FUNCTION >>>>>> | | >>>>>> 49.952477: eem_wrap <-eth_start_xmit >>>>>> 55.072455: eem_wrap <-eth_start_xmit >>>>>> 55.072621: eem_unwrap <-rx_complete >>>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>>>>> 59.011545: composite_reset <-configfs_composite_reset >>>>>> 59.011548: reset_config <-__composite_disconnect >>>>>> 59.013565: eem_disable <-reset_config >>>>>> 59.013567: gether_disconnect <-reset_config >>>>>> 59.049560: device_remove_file <-device_remove >>>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>>>>> 59.051189: composite_disconnect <-configfs_composite_discon >>>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>>>>> 59.052519: eem_unbind <-purge_configs_funcs >>>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>>>>> 59.052537: device_remove_file <-composite_dev_cleanup >>>>>> >>>>>> device_remove_file gets called twice, once by device_remove after >>>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup >>>>>> (removing the gadget) >>>>> >>>>> I believe that the device_remove_file function is only removing >>>>> suspend-specific attributes, not the complete gadget. Typically, when you >>>>> perform the role switch, the Gadget start/stop function in your UDC driver is >>>>> called. These functions should not delete the gadget >>>>> >>>>> To investigate further, could you please enable the DWC3 functions in ftrace >>>>> and check who is removing the gadget? I can also enable this on my system >>>>> and compare the logs with yours, but I will be in PI planning for 1.5 weeks >>>>> and may not be able to provide immediate support. >>>> >>>> Since we are almost at -rc7, I propose to revert and try again next cycle. >>> >>> Why? There is no problem with this patch! >>> >>> I don't know why you are so excited to revert the patch instead of >>> investigating the original issue. Even though I have proved that the >>> problem is caused by usb0 being removed just by role-switching the >>> port by the UDC driver, this behavior is incorrect. >>> >>> There is no behavior in the Linux kernel that keeps the network >>> interface and removes the related sysfs entry. THIS IS WRONG and has >>> been FIXED without reverting any mainline patch. >>> >>> Please don't encourage reverting any mainstream patches to avoid >>> investigating a (probably) faulty driver. This is not how the >>> community should work. >> >> I only skimmed the thread and this is not my area of expertise, so >> correct me if I'm wrong anywhere here: >> >> But from what I see I tend to disagree: the patch mentioned in the >> subject apparently regressed something. This afaics was reported 16 >> weeks ago during the 6.7 cycle[1]. Due to how we apply the "no >> regressions" rule[2] the change causing this ideally should have been >> reverted before 6.8 was release -- it does not matter that it just >> exposed an existing problem rooted somewhere else. That revert did not >> happen. Then even a few more weeks went by and this is still not fixed. >> Then I'd guess reverting this might be the right course of action to >> motivate someone to fix the exposed problem. Unless of course we know or >> fear that a revert causes regressions for users of 6.8.y -- is that the >> case? >> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_all_ZaQS5x-2DXK08Jre6I-40smile.fi.intel.com_&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=HBokvujOYQdyCxk3eMV6xtwLLNdCxjsnZ4nTzbhbLXM&e= >> >> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.kernel.org_process_handling-2Dregressions.html&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=lOh1a5Vzu2VNEaspoNgEipxRU9UDDPcibRLoOVrGbIU&e= >> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> -- >> Everything you wanna know about Linux kernel regression tracking: >> https://urldefense.proofpoint.com/v2/url?u=https-3A__linux-2Dregtracking.leemhuis.info_about_-23tldr&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=QAHqX_KD5JuTUUtqftI9GSmckKDKzRyJ5qu0DT0NdpJ7Gxk6vfsvBPrBvJLM0fzb&s=MuVuulVZqYAjIqCVpNc01ve8yD6DFt2wOZEH4p9khTc&e= >> If I did something stupid, please tell me, as explained on that page. > > I understand your perspective. Do you have documentation or > guidelines available to ensure that the existing problem is resolved > and the correct patch is reinstated? The second link I gave (e.g. to a rendered version of Documentation/process/handling-regressions.rst ) has quite a few quotes from Linus which iirc should support this. Apart from this there is only the lore archive and the the git log afaik. > As far as I know, the issue is currently only reproducible by the > reporter. I also have a similar setup, but I have not been able to > reproduce the issue myself (please refer to our previous > discussions). This discrepancy could be due to some OEM > modifications. To the board/machine in question or the kernel used beforehand? If the former: that does not matter much. In the latter: than of course it's not a regression. > If the problem is only reproducible by the reporter, how do we > motivate others to support and upstream the fix? The general idea afaics is: there was someone that for some reason was motivated to bring in the patch that caused the regression (indirectly in this case); with a bit of luck that someone after the revert wants it to be included again and thus is motivated to fix it (or search for the underlying issue and fix that in this case). Of course that does not always work out. OTOH with a bit of luck the other developers will help with that, too. And there are gray areas, but they are not that big. Not sure if this is one, that in the end is up to Greg or Linus to decide. > Isn't it > demotivating to revert the correct patch and retain the wrong > behavior of the driver? Sure, but I'd say it's even more demotivating if things break for users when they update their kernel, as then they won't update the next time and stay on old versions with bugs and vulnerabilities. As that's afaics what Linus wants to prevent -- and why he made the rule and established current practices. > Additionally, is it possible to determine how many correct patches > have been reverted in the past to address this particular problem? git log? > Please correct me if I am wrong ? HTH, Ciao, Thorsten >>> Also, I'm a bit wondering: the patch has been in the mainline for >>> some time, and we haven't had any issues except this one, which is >>> due to the wrong behavior of the UDC driver. >>>> >>>>> Additionally, please check if you have any customized DWC patches that may be causing this problem. >>>>> >>>>>>> You may recall the whole issue did not occur before this patch got applied. >>>>>>> >>>>>>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>>>>> kobject and sysfs interface. >>>>>>>>>>> I suggest sharing the analysis here to understand why this practice >>>>>>>>>>> is not followed in your use case or driver ? >>>>>>>>>> >>>>>>>>>> Yes, I'll try to trace where that happens. >>>>>>>>>> >>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>>>>> harmless? But the usb: net device remaining after disconnect or role >>>>>>>>>> switch is not good, as the route remains. >>>>>>>>>> >>>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what >>>>>>>>>> happens if you make eem connection and then unplug? >>>>>>>>>> >>>>>>>>>>> I am curious why the driver was developed without adhering to the >>>>>>>>>>> kernel's gadget architecture. >>>>>>>>> >>>>>>>>> I don't know what you mean here. Which driver do you mean? >>>>>>>>> >>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>>>>> with this patch and >>>>>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>>>>> controller as well. But >>>>>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>>>>> We are happy to test any fixes if you come up with. >>>> >>>> -- >>>> With Best Regards, >>>> Andy Shevchenko >>>> >>>> > >
On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: > Oops, sorry, wrong file attached . Now correct one. > > Op 02-05-2024 om 22:13 schreef Ferry Toth: > > Op 02-05-2024 om 17:29 schreef Hardik Gajjar: > > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > > > > Hi, > > > > > > > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > > > > Hi, > > > > > > > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > > > > Hi, > > > > > > > > > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > > > > Hi, > > > > > > > > > > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: > > > > > > > > > > > > > On Wed, Apr 10, 2024 at > > > > > > > > > > > > > 08:37:42PM +0300, Andy > > > > > > > > > > > > > Shevchenko wrote: > > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > > > > no > > > > > > > > > > > > > > > longer exists > > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > > > > present, but the symlink is > > > > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > > > > recreate the device, and the > > > > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > ls: cannot access > > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > > > > No such file > > > > > > > > > > > or directory > > > > > > > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > > > > > > > Mass storage device is created and removed each time as expected. > > > > > > > > > > > > > > > > > > > > So, what's the conclusion? Shall we move towards revert of those > > > > > > > > > > two changes? > > > > > > > > > > > > > > > > > > > > > > > > > > > As promised, I have the tested the this patch with the dwc3 gadget. > > > > > > > > > I could not reproduce > > > > > > > > > the issue. > > > > > > > > > > > > > > > > > > I can see the usb0 exist all the time and accessible regardless of > > > > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > > > > > > > Following are the logs: > > > > > > > > > //Host to device > > > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" > > > > > > > > > > mode > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > > usb0 > > > > > > > > > > > > > > > > > > //device to host > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > # echo "host" > mode > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > > usb0 > > > > > > > > > > > > > > > > That is weird. When I switch to host mode (using > > > > > > > > the physical switch), > > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > > > > > > > Switching back to device mode, that gadget > > > > > > > > directory is recreated. And > > > > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > > > > < > > > > > > > > > addr_assign_type duplex phys_port_name > > > > > > > > > addr_len flags phys_switch_id > > > > > > > > > address gro_flush_timeout power > > > > > > > > > broadcast ifalias proto_down > > > > > > > > > carrier ifindex queues > > > > > > > > > carrier_changes iflink speed > > > > > > > > > carrier_down_count link_mode statistics > > > > > > > > > carrier_up_count mtu subsystem > > > > > > > > > dev_id name_assign_type tx_queue_len > > > > > > > > > dev_port netdev_group type > > > > > > > > > device operstate uevent > > > > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 > > > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > > > > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > > > > > > > > > collisions:0 txqueuelen:1000 > > > > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > > > > > > > I strongly advise against reverting the patch solely based on the > > > > > > > > > observed issue of removing the /sys/class/net/usb0 directory while > > > > > > > > > the usb0 interface remains available. > > > > > > > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > > > > > > > I have while in host mode: > > > > > > > > root@yuna:~# ifconfig -a usb0 > > > > > > > > usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 > > > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > > > > 10.42.0.255 > > > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > > > > > > > Instead, I recommend enabling FTRACE to > > > > > > > > > trace the functions involved > > > > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > > > > plugin 'function_graph' > > > > > > > > Hit Ctrl^C to stop recording > > > > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > > > > 188 bytes in size (4096 uncompressed) > > > > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > > > > 0 bytes in size (0 uncompressed) > > > > > > > > root@yuna:~# trace-cmd report > > > > > > > > cpus=2 > > > > > > > > irq/68-dwc3-725 [000] 514.575337: > > > > > > > > funcgraph_entry: # > > > > > > > > 2079.480 us | gether_disconnect(); > > > > > > > > irq/68-dwc3-946 [000] 524.263731: > > > > > > > > funcgraph_entry: + > > > > > > > > 11.640 us | gether_disconnect(); > > > > > > > > irq/68-dwc3-946 [000] 524.263743: > > > > > > > > funcgraph_entry: ! > > > > > > > > 116.520 us | gether_connect(); > > > > > > > > irq/68-dwc3-946 [000] 524.268029: > > > > > > > > funcgraph_entry: # > > > > > > > > 2057.260 us | gether_disconnect(); > > > > > > > > irq/68-dwc3-946 [000] 524.270089: > > > > > > > > funcgraph_entry: ! > > > > > > > > 109.000 us | gether_connect(); > > > > > > > > > > > > > > I tried to get a more useful trace: > > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' > > > > > > > >> set_ftrace_filter > > > > > > > -> switch to host mode then back to device > > > > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > > > > # tracer: function > > > > > > > # > > > > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > > > > # > > > > > > > # _-----=> irqs-off/BH-disabled > > > > > > > # / _----=> need-resched > > > > > > > # | / _---=> hardirq/softirq > > > > > > > # || / _--=> preempt-depth > > > > > > > # ||| / _-=> migrate-disable > > > > > > > # |||| / delay > > > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > > > > # | | | ||||| | | > > > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > > > > <-__composite_disconnect > > > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > > > > <-reset_config > > > > > > > irq/68-dwc3-523 [000] D..3. 133.992276: > > > > > > > gether_disconnect > > > > > > > <-reset_config > > > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > > > > <-purge_configs_funcs > > > > > > > > > > > > > > -> to device mode > > > > > > > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > > > > <-usb_add_function > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > > > > <-composite_setup > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155215: > > > > > > > gether_disconnect > > > > > > > <-eem_set_alt > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect > > > > > > > <-eem_set_alt > > > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > > > > <-composite_setup > > > > > > > irq/68-dwc3-734 [000] D..3. 149.157292: > > > > > > > gether_disconnect > > > > > > > <-eem_set_alt > > > > > > > irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect > > > > > > > <-eem_set_alt > > > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > > > > <-rx_complete > > > > > > > ... > > > > > > > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > > > > > > > do you see anything related to gether_cleanup() after eem_unbind() ? > > > > > > > > > > Nope. It's a pitty that the trace formatting got messed up > > > > > above. But as > > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > > > > function is called, until I flip the switch to device mode. > > > > > The ... at the end is where I cut uninteresting eem_unwrap > > > > > <-rx_complete > > > > > and eem_wrap <-eth_start_xmit lines. > > > > > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > > > > network side to check who deleting the sysfs entry > > > > > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > > > > that would be related to the patch we're discussing here. I was hoping > > > > > for a little more targeted hint. > > > > > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', > > > > 'usb_fun*', > > > > 'reset_config' and 'device_remove_file' leads me to: > > > > > > > > TIMESTAMP FUNCTION > > > > | | > > > > 49.952477: eem_wrap <-eth_start_xmit > > > > 55.072455: eem_wrap <-eth_start_xmit > > > > 55.072621: eem_unwrap <-rx_complete > > > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > > > > 59.011545: composite_reset <-configfs_composite_reset > > > > 59.011548: reset_config <-__composite_disconnect > > > > 59.013565: eem_disable <-reset_config > > > > 59.013567: gether_disconnect <-reset_config > > > > 59.049560: device_remove_file <-device_remove > > > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > > > > 59.051189: composite_disconnect <-configfs_composite_discon > > > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > > > > 59.052519: eem_unbind <-purge_configs_funcs > > > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > > > > 59.052537: device_remove_file <-composite_dev_cleanup > > > > > > > > device_remove_file gets called twice, once by device_remove after > > > > gether_disconnect (that the one). The 2nd time by composite_dev_cleanup > > > > (removing the gadget) > > > > > > I believe that the device_remove_file function is only removing > > > suspend-specific attributes, not the complete gadget. > > > Typically, when you perform the role switch, the Gadget start/stop > > > function in your UDC driver is called. These functions should not > > > delete the gadget > > > > > > To investigate further, could you please enable the DWC3 functions > > > in ftrace and check who is removing the gadget? > > > I can also enable this on my system and compare the logs with yours, > > > but I will be in PI planning for 1.5 weeks and may not be able to > > > provide immediate support. > > > > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so > > trace is 600kb. I cut uninteresting stuff before > > configfs_composite_reset <-usb_gadget_udc_reset and after > > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> > > compare. > > Could you please try with the following patch and see if your issue resolves ? diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c index 3b445bd88498..c2a904475083 100644 --- a/drivers/usb/gadget/function/f_eem.c +++ b/drivers/usb/gadget/function/f_eem.c @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f) struct usb_composite_dev *cdev = c->cdev; struct f_eem *eem = func_to_eem(f); struct usb_string *us; - int status; + int status = 0; struct usb_ep *ep; struct f_eem_opts *eem_opts; @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f) * with list_for_each_entry, so we assume no race condition * with regard to eem_opts->bound access */ + mutex_lock(&eem_opts->lock); + gether_set_gadget(eem_opts->net, cdev->gadget); + if (!eem_opts->bound) { - mutex_lock(&eem_opts->lock); - gether_set_gadget(eem_opts->net, cdev->gadget); status = gether_register_netdev(eem_opts->net); - mutex_unlock(&eem_opts->lock); - if (status) - return status; - eem_opts->bound = true; } + mutex_unlock(&eem_opts->lock); + + if (status) + goto fail; + + eem_opts->bound = true; + us = usb_gstrings_attach(cdev, eem_strings, ARRAY_SIZE(eem_string_defs)); > > > Additionally, please check if you have any customized DWC patches > > > that may be causing this problem. > > > > > > > > > > > > You may recall the whole issue did not occur before this > > > > > patch got applied. > > > > > > > > > > > Hardik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > > > > kobject and sysfs interface. > > > > > > > > > I suggest sharing the analysis here to understand why this practice > > > > > > > > > is not followed in your use case or driver ? > > > > > > > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > > > > harmless? But the usb: net device remaining after disconnect or role > > > > > > > > switch is not good, as the route remains. > > > > > > > > > > > > > > > > May be they are 2 separate problems. Could you try to reproduce what > > > > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > > > > > > > I am curious why the driver was developed without adhering to the > > > > > > > > > kernel's gadget architecture. > > > > > > > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > > > > with this patch and > > > > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > > > > Would have been nice if someone could test on other > > > > > > > > > > > controller as well. But > > > > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > > > > We are happy to test any fixes if you come up with. > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > With Best Regards, > > > > > > > > > > Andy Shevchenko > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Hi, Sorry to have kept you waiting, I was away for a short break. Patch tested below. Op 10-05-2024 om 11:45 schreef Hardik Gajjar: > On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: >> Oops, sorry, wrong file attached . Now correct one. >> >> Op 02-05-2024 om 22:13 schreef Ferry Toth: >>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar: >>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>>>> Hi, >>>>> >>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>>>> Hi, >>>>>> >>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar wrote: >>>>>>>>>>>>>> On Wed, Apr 10, 2024 at >>>>>>>>>>>>>> 08:37:42PM +0300, Andy >>>>>>>>>>>>>> Shevchenko wrote: >>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>>>>>>> >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>> longer exists >>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>>>> symlink should become active again >>>>>>>>>>>> >>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>>>>> >>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>>>> >>>>>>>>>>>> Then switching to host mode: >>>>>>>>>>>> >>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>> ls: cannot access >>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>>>> No such file >>>>>>>>>>>> or directory >>>>>>>>>>>> >>>>>>>>>>>> Then back to device mode: >>>>>>>>>>>> >>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>>>> >>>>>>>>>>>> net is missing. But, network functions: >>>>>>>>>>>> >>>>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>>>> >>>>>>>>>>>> Mass storage device is created and removed each time as expected. >>>>>>>>>>> >>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of those >>>>>>>>>>> two changes? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> As promised, I have the tested the this patch with the dwc3 gadget. >>>>>>>>>> I could not reproduce >>>>>>>>>> the issue. >>>>>>>>>> >>>>>>>>>> I can see the usb0 exist all the time and accessible regardless of >>>>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>>>> >>>>>>>>>> Following are the logs: >>>>>>>>>> //Host to device >>>>>>>>>> >>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo "peripheral" >>>>>>>>>>> mode >>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>> usb0 >>>>>>>>>> >>>>>>>>>> //device to host >>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb >>>>>>>>>> # echo "host" > mode >>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>> usb0 >>>>>>>>> >>>>>>>>> That is weird. When I switch to host mode (using >>>>>>>>> the physical switch), >>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>>>> >>>>>>>>> Switching back to device mode, that gadget >>>>>>>>> directory is recreated. And >>>>>>>>> gadget/sound as well, but not gadget/net. >>>>>>>>> >>>>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>>>> < >>>>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>>>> addr_len flags phys_switch_id >>>>>>>>>> address gro_flush_timeout power >>>>>>>>>> broadcast ifalias proto_down >>>>>>>>>> carrier ifindex queues >>>>>>>>>> carrier_changes iflink speed >>>>>>>>>> carrier_down_count link_mode statistics >>>>>>>>>> carrier_up_count mtu subsystem >>>>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>>>> dev_port netdev_group type >>>>>>>>>> device operstate uevent >>>>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a usb0 >>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 >>>>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>>>> >>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>>>> >>>>>>>>>> I strongly advise against reverting the patch solely based on the >>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory while >>>>>>>>>> the usb0 interface remains available. >>>>>>>>> >>>>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>>>> >>>>>>>>> I have while in host mode: >>>>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> mtu 1500 >>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>>>> 10.42.0.255 >>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>>>> scopeid 0x20<link> >>>>>>>>> >>>>>>>>> >>>>>>>>> You don't see that because you didn't create a connection at all. >>>>>>>>> >>>>>>>>>> Instead, I recommend enabling FTRACE to >>>>>>>>>> trace the functions involved >>>>>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>>>>> >>>>>>>>> Switching from device -> host -> device: >>>>>>>>> >>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>>>> plugin 'function_graph' >>>>>>>>> Hit Ctrl^C to stop recording >>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>>>> 0 bytes in size (0 uncompressed) >>>>>>>>> root@yuna:~# trace-cmd report >>>>>>>>> cpus=2 >>>>>>>>> irq/68-dwc3-725 [000] 514.575337: >>>>>>>>> funcgraph_entry: # >>>>>>>>> 2079.480 us | gether_disconnect(); >>>>>>>>> irq/68-dwc3-946 [000] 524.263731: >>>>>>>>> funcgraph_entry: + >>>>>>>>> 11.640 us | gether_disconnect(); >>>>>>>>> irq/68-dwc3-946 [000] 524.263743: >>>>>>>>> funcgraph_entry: ! >>>>>>>>> 116.520 us | gether_connect(); >>>>>>>>> irq/68-dwc3-946 [000] 524.268029: >>>>>>>>> funcgraph_entry: # >>>>>>>>> 2057.260 us | gether_disconnect(); >>>>>>>>> irq/68-dwc3-946 [000] 524.270089: >>>>>>>>> funcgraph_entry: ! >>>>>>>>> 109.000 us | gether_connect(); >>>>>>>> >>>>>>>> I tried to get a more useful trace: >>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >>>>>>>>>> set_ftrace_filter >>>>>>>> -> switch to host mode then back to device >>>>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>>>> # tracer: function >>>>>>>> # >>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>>>> # >>>>>>>> # _-----=> irqs-off/BH-disabled >>>>>>>> # / _----=> need-resched >>>>>>>> # | / _---=> hardirq/softirq >>>>>>>> # || / _--=> preempt-depth >>>>>>>> # ||| / _-=> migrate-disable >>>>>>>> # |||| / delay >>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>>>> # | | | ||||| | | >>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>>>>> <-__composite_disconnect >>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>>>> <-reset_config >>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: >>>>>>>> gether_disconnect >>>>>>>> <-reset_config >>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>>>> <-purge_configs_funcs >>>>>>>> >>>>>>>> -> to device mode >>>>>>>> >>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>>>> <-usb_add_function >>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>>>> <-composite_setup >>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: >>>>>>>> gether_disconnect >>>>>>>> <-eem_set_alt >>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: gether_connect >>>>>>>> <-eem_set_alt >>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>>>> <-composite_setup >>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: >>>>>>>> gether_disconnect >>>>>>>> <-eem_set_alt >>>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: gether_connect >>>>>>>> <-eem_set_alt >>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>>>> <-rx_complete >>>>>>>> ... >>>>>>>> >>>>>>>> I don't know where to look exactly. Any hints? >>>>>>> >>>>>>> do you see anything related to gether_cleanup() after eem_unbind() ? >>>>>> >>>>>> Nope. It's a pitty that the trace formatting got messed up >>>>>> above. But as >>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>>>> function is called, until I flip the switch to device mode. >>>>>> The ... at the end is where I cut uninteresting eem_unwrap >>>>>> <-rx_complete >>>>>> and eem_wrap <-eth_start_xmit lines. >>>>>> >>>>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>>>> network side to check who deleting the sysfs entry >>>>>> >>>>>> Yes, that's a vast amount of functions to trace. And I don't see how >>>>>> that would be related to the patch we're discussing here. I was hoping >>>>>> for a little more targeted hint. >>>>> >>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', >>>>> 'usb_fun*', >>>>> 'reset_config' and 'device_remove_file' leads me to: >>>>> >>>>> TIMESTAMP FUNCTION >>>>> | | >>>>> 49.952477: eem_wrap <-eth_start_xmit >>>>> 55.072455: eem_wrap <-eth_start_xmit >>>>> 55.072621: eem_unwrap <-rx_complete >>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>>>> 59.011545: composite_reset <-configfs_composite_reset >>>>> 59.011548: reset_config <-__composite_disconnect >>>>> 59.013565: eem_disable <-reset_config >>>>> 59.013567: gether_disconnect <-reset_config >>>>> 59.049560: device_remove_file <-device_remove >>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>>>> 59.051189: composite_disconnect <-configfs_composite_discon >>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>>>> 59.052519: eem_unbind <-purge_configs_funcs >>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>>>> 59.052537: device_remove_file <-composite_dev_cleanup >>>>> >>>>> device_remove_file gets called twice, once by device_remove after >>>>> gether_disconnect (that the one). The 2nd time by composite_dev_cleanup >>>>> (removing the gadget) >>>> >>>> I believe that the device_remove_file function is only removing >>>> suspend-specific attributes, not the complete gadget. >>>> Typically, when you perform the role switch, the Gadget start/stop >>>> function in your UDC driver is called. These functions should not >>>> delete the gadget >>>> >>>> To investigate further, could you please enable the DWC3 functions >>>> in ftrace and check who is removing the gadget? >>>> I can also enable this on my system and compare the logs with yours, >>>> but I will be in PI planning for 1.5 weeks and may not be able to >>>> provide immediate support. >>> >>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so >>> trace is 600kb. I cut uninteresting stuff before >>> configfs_composite_reset <-usb_gadget_udc_reset and after >>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> >>> compare. >>> > Could you please try with the following patch and see if your issue resolves ? > > diff --git a/drivers/usb/gadget/function/f_eem.c b/drivers/usb/gadget/function/f_eem.c > index 3b445bd88498..c2a904475083 100644 > --- a/drivers/usb/gadget/function/f_eem.c > +++ b/drivers/usb/gadget/function/f_eem.c > @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f) > struct usb_composite_dev *cdev = c->cdev; > struct f_eem *eem = func_to_eem(f); > struct usb_string *us; > - int status; > + int status = 0; > struct usb_ep *ep; > > struct f_eem_opts *eem_opts; > @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, struct usb_function *f) > * with list_for_each_entry, so we assume no race condition > * with regard to eem_opts->bound access > */ > + mutex_lock(&eem_opts->lock); > + gether_set_gadget(eem_opts->net, cdev->gadget); > + > if (!eem_opts->bound) { > - mutex_lock(&eem_opts->lock); > - gether_set_gadget(eem_opts->net, cdev->gadget); > status = gether_register_netdev(eem_opts->net); > - mutex_unlock(&eem_opts->lock); > - if (status) > - return status; > - eem_opts->bound = true; > } > > + mutex_unlock(&eem_opts->lock); > + > + if (status) > + goto fail; > + > + eem_opts->bound = true; > + > us = usb_gstrings_attach(cdev, eem_strings, > ARRAY_SIZE(eem_string_defs)); > After switching from host -> device -> host the usb0 device as seen by ifconfig remains "RUNNING" and in the routing table. FYI This is the regression. Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is still missing. After deeper diving into v6.1.0, I found the latter a longer existing bug, not caused by your patch. More over, this doesn't seem to do any harm. The first issue does, as we try to route traffic over a device that no longer exists. >>>> Additionally, please check if you have any customized DWC patches >>>> that may be causing this problem. >>>> >>>>> >>>>>> You may recall the whole issue did not occur before this >>>>>> patch got applied. >>>>>> >>>>>>> Hardik >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>>>> kobject and sysfs interface. >>>>>>>>>> I suggest sharing the analysis here to understand why this practice >>>>>>>>>> is not followed in your use case or driver ? >>>>>>>>> >>>>>>>>> Yes, I'll try to trace where that happens. >>>>>>>>> >>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>>>> harmless? But the usb: net device remaining after disconnect or role >>>>>>>>> switch is not good, as the route remains. >>>>>>>>> >>>>>>>>> May be they are 2 separate problems. Could you try to reproduce what >>>>>>>>> happens if you make eem connection and then unplug? >>>>>>>>> >>>>>>>>>> I am curious why the driver was developed without adhering to the >>>>>>>>>> kernel's gadget architecture. >>>>>>>> >>>>>>>> I don't know what you mean here. Which driver do you mean? >>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>>>> with this patch and >>>>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>>>> controller as well. But >>>>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>>>> We are happy to test any fixes if you come up with. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> With Best Regards, >>>>>>>>>>> Andy Shevchenko >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>> > >
Hi Op 15-05-2024 om 20:38 schreef Ferry Toth: > Hi, > > Sorry to have kept you waiting, I was away for a short break. > > Patch tested below. > > Op 10-05-2024 om 11:45 schreef Hardik Gajjar: >> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: >>> Oops, sorry, wrong file attached . Now correct one. >>> >>> Op 02-05-2024 om 22:13 schreef Ferry Toth: >>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar: >>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>>>>> Hi, >>>>>> >>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>>>>> Hi, >>>>>>> >>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at >>>>>>>>>>>>>>> 08:37:42PM +0300, Andy >>>>>>>>>>>>>>> Shevchenko wrote: >>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>>>>>>>> >>>>>>>>>>>> ... >>>>>>>>>>>> >>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>>>>> target >>>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>> longer exists >>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>>>>> symlink should become active again >>>>>>>>>>>>> >>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>>>>> >>>>>>>>>>>>> Then switching to host mode: >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>> ls: cannot access >>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>>>>> No such file >>>>>>>>>>>>> or directory >>>>>>>>>>>>> >>>>>>>>>>>>> Then back to device mode: >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>>>>> >>>>>>>>>>>>> net is missing. But, network functions: >>>>>>>>>>>>> >>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>>>>> >>>>>>>>>>>>> Mass storage device is created and removed each time as >>>>>>>>>>>>> expected. >>>>>>>>>>>> >>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of >>>>>>>>>>>> those >>>>>>>>>>>> two changes? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 >>>>>>>>>>> gadget. >>>>>>>>>>> I could not reproduce >>>>>>>>>>> the issue. >>>>>>>>>>> >>>>>>>>>>> I can see the usb0 exist all the time and accessible >>>>>>>>>>> regardless of >>>>>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>>>>> >>>>>>>>>>> Following are the logs: >>>>>>>>>>> //Host to device >>>>>>>>>>> >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo >>>>>>>>>>> "peripheral" >>>>>>>>>>>> mode >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>> usb0 >>>>>>>>>>> >>>>>>>>>>> //device to host >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb >>>>>>>>>>> # echo "host" > mode >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>> usb0 >>>>>>>>>> >>>>>>>>>> That is weird. When I switch to host mode (using >>>>>>>>>> the physical switch), >>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>>>>> >>>>>>>>>> Switching back to device mode, that gadget >>>>>>>>>> directory is recreated. And >>>>>>>>>> gadget/sound as well, but not gadget/net. >>>>>>>>>> >>>>>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>>>>> < >>>>>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>>>>> addr_len flags phys_switch_id >>>>>>>>>>> address gro_flush_timeout power >>>>>>>>>>> broadcast ifalias proto_down >>>>>>>>>>> carrier ifindex queues >>>>>>>>>>> carrier_changes iflink speed >>>>>>>>>>> carrier_down_count link_mode statistics >>>>>>>>>>> carrier_up_count mtu subsystem >>>>>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>>>>> dev_port netdev_group type >>>>>>>>>>> device operstate uevent >>>>>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig -a >>>>>>>>>>> usb0 >>>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 >>>>>>>>>>> carrier:0 >>>>>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>>>>> >>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>>>>> >>>>>>>>>>> I strongly advise against reverting the patch solely based on >>>>>>>>>>> the >>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory >>>>>>>>>>> while >>>>>>>>>>> the usb0 interface remains available. >>>>>>>>>> >>>>>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>>>>> >>>>>>>>>> I have while in host mode: >>>>>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> >>>>>>>>>> mtu 1500 >>>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>>>>> 10.42.0.255 >>>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>>>>> scopeid 0x20<link> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You don't see that because you didn't create a connection at all. >>>>>>>>>> >>>>>>>>>>> Instead, I recommend enabling FTRACE to >>>>>>>>>>> trace the functions involved >>>>>>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>>>>>> >>>>>>>>>> Switching from device -> host -> device: >>>>>>>>>> >>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>>>>> plugin 'function_graph' >>>>>>>>>> Hit Ctrl^C to stop recording >>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>>>>> 0 bytes in size (0 uncompressed) >>>>>>>>>> root@yuna:~# trace-cmd report >>>>>>>>>> cpus=2 >>>>>>>>>> irq/68-dwc3-725 [000] 514.575337: >>>>>>>>>> funcgraph_entry: # >>>>>>>>>> 2079.480 us | gether_disconnect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.263731: >>>>>>>>>> funcgraph_entry: + >>>>>>>>>> 11.640 us | gether_disconnect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.263743: >>>>>>>>>> funcgraph_entry: ! >>>>>>>>>> 116.520 us | gether_connect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.268029: >>>>>>>>>> funcgraph_entry: # >>>>>>>>>> 2057.260 us | gether_disconnect(); >>>>>>>>>> irq/68-dwc3-946 [000] 524.270089: >>>>>>>>>> funcgraph_entry: ! >>>>>>>>>> 109.000 us | gether_connect(); >>>>>>>>> >>>>>>>>> I tried to get a more useful trace: >>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >>>>>>>>>>> set_ftrace_filter >>>>>>>>> -> switch to host mode then back to device >>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>>>>> # tracer: function >>>>>>>>> # >>>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>>>>> # >>>>>>>>> # _-----=> irqs-off/BH-disabled >>>>>>>>> # / _----=> need-resched >>>>>>>>> # | / _---=> hardirq/softirq >>>>>>>>> # || / _--=> preempt-depth >>>>>>>>> # ||| / _-=> migrate-disable >>>>>>>>> # |||| / delay >>>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>>>>> # | | | ||||| | | >>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>>>>>> <-__composite_disconnect >>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>>>>> <-reset_config >>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: >>>>>>>>> gether_disconnect >>>>>>>>> <-reset_config >>>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>>>>> <-purge_configs_funcs >>>>>>>>> >>>>>>>>> -> to device mode >>>>>>>>> >>>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>>>>> <-usb_add_function >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>>>>> <-composite_setup >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: >>>>>>>>> gether_disconnect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: >>>>>>>>> gether_connect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>>>>> <-composite_setup >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: >>>>>>>>> gether_disconnect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: >>>>>>>>> gether_connect >>>>>>>>> <-eem_set_alt >>>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>>>>> <-rx_complete >>>>>>>>> ... >>>>>>>>> >>>>>>>>> I don't know where to look exactly. Any hints? >>>>>>>> >>>>>>>> do you see anything related to gether_cleanup() after >>>>>>>> eem_unbind() ? >>>>>>> >>>>>>> Nope. It's a pitty that the trace formatting got messed up >>>>>>> above. But as >>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>>>>> function is called, until I flip the switch to device mode. >>>>>>> The ... at the end is where I cut uninteresting eem_unwrap >>>>>>> <-rx_complete >>>>>>> and eem_wrap <-eth_start_xmit lines. >>>>>>> >>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>>>>> network side to check who deleting the sysfs entry >>>>>>> >>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how >>>>>>> that would be related to the patch we're discussing here. I was >>>>>>> hoping >>>>>>> for a little more targeted hint. >>>>>> >>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', >>>>>> 'usb_fun*', >>>>>> 'reset_config' and 'device_remove_file' leads me to: >>>>>> >>>>>> TIMESTAMP FUNCTION >>>>>> | | >>>>>> 49.952477: eem_wrap <-eth_start_xmit >>>>>> 55.072455: eem_wrap <-eth_start_xmit >>>>>> 55.072621: eem_unwrap <-rx_complete >>>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>>>>> 59.011545: composite_reset <-configfs_composite_reset >>>>>> 59.011548: reset_config <-__composite_disconnect >>>>>> 59.013565: eem_disable <-reset_config >>>>>> 59.013567: gether_disconnect <-reset_config >>>>>> 59.049560: device_remove_file <-device_remove >>>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>>>>> 59.051189: composite_disconnect <-configfs_composite_discon >>>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>>>>> 59.052519: eem_unbind <-purge_configs_funcs >>>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>>>>> 59.052537: device_remove_file <-composite_dev_cleanup >>>>>> >>>>>> device_remove_file gets called twice, once by device_remove after >>>>>> gether_disconnect (that the one). The 2nd time by >>>>>> composite_dev_cleanup >>>>>> (removing the gadget) >>>>> >>>>> I believe that the device_remove_file function is only removing >>>>> suspend-specific attributes, not the complete gadget. >>>>> Typically, when you perform the role switch, the Gadget start/stop >>>>> function in your UDC driver is called. These functions should not >>>>> delete the gadget >>>>> >>>>> To investigate further, could you please enable the DWC3 functions >>>>> in ftrace and check who is removing the gadget? >>>>> I can also enable this on my system and compare the logs with yours, >>>>> but I will be in PI planning for 1.5 weeks and may not be able to >>>>> provide immediate support. >>>> >>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so >>>> trace is 600kb. I cut uninteresting stuff before >>>> configfs_composite_reset <-usb_gadget_udc_reset and after >>>> __dwc3_set_mode, >>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> >>>> compare. >>>> >> Could you please try with the following patch and see if your issue >> resolves ? >> >> diff --git a/drivers/usb/gadget/function/f_eem.c >> b/drivers/usb/gadget/function/f_eem.c >> index 3b445bd88498..c2a904475083 100644 >> --- a/drivers/usb/gadget/function/f_eem.c >> +++ b/drivers/usb/gadget/function/f_eem.c >> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, >> struct usb_function *f) >> struct usb_composite_dev *cdev = c->cdev; >> struct f_eem *eem = func_to_eem(f); >> struct usb_string *us; >> - int status; >> + int status = 0; >> struct usb_ep *ep; >> struct f_eem_opts *eem_opts; >> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration *c, >> struct usb_function *f) >> * with list_for_each_entry, so we assume no race condition >> * with regard to eem_opts->bound access >> */ >> + mutex_lock(&eem_opts->lock); >> + gether_set_gadget(eem_opts->net, cdev->gadget); >> + >> if (!eem_opts->bound) { >> - mutex_lock(&eem_opts->lock); >> - gether_set_gadget(eem_opts->net, cdev->gadget); >> status = gether_register_netdev(eem_opts->net); >> - mutex_unlock(&eem_opts->lock); >> - if (status) >> - return status; >> - eem_opts->bound = true; >> } >> + mutex_unlock(&eem_opts->lock); >> + >> + if (status) >> + goto fail; >> + >> + eem_opts->bound = true; >> + >> us = usb_gstrings_attach(cdev, eem_strings, >> ARRAY_SIZE(eem_string_defs)); >> > > After switching from host -> device -> host the usb0 device as seen by > ifconfig remains "RUNNING" and in the routing table. > > FYI This is the regression. For us, I don't see any advantages in having this patch and only the downside. Do you have any further ideas / patches that we could test. Otherwise, it might be best to go with the original suggestion and revert until it gets sorted out? > Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is > still missing. > > After deeper diving into v6.1.0, I found the latter a longer existing > bug, not caused by your patch. > > More over, this doesn't seem to do any harm. > > The first issue does, as we try to route traffic over a device that no > longer exists. > >>>>> Additionally, please check if you have any customized DWC patches >>>>> that may be causing this problem. >>>>> >>>>>> >>>>>>> You may recall the whole issue did not occur before this >>>>>>> patch got applied. >>>>>>> >>>>>>>> Hardik >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>>>>> kobject and sysfs interface. >>>>>>>>>>> I suggest sharing the analysis here to understand why this >>>>>>>>>>> practice >>>>>>>>>>> is not followed in your use case or driver ? >>>>>>>>>> >>>>>>>>>> Yes, I'll try to trace where that happens. >>>>>>>>>> >>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>>>>> harmless? But the usb: net device remaining after disconnect >>>>>>>>>> or role >>>>>>>>>> switch is not good, as the route remains. >>>>>>>>>> >>>>>>>>>> May be they are 2 separate problems. Could you try to >>>>>>>>>> reproduce what >>>>>>>>>> happens if you make eem connection and then unplug? >>>>>>>>>> >>>>>>>>>>> I am curious why the driver was developed without adhering to >>>>>>>>>>> the >>>>>>>>>>> kernel's gadget architecture. >>>>>>>>> >>>>>>>>> I don't know what you mean here. Which driver do you mean? >>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>>>>> with this patch and >>>>>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>>>>> controller as well. But >>>>>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>>>>> We are happy to test any fixes if you come up with. >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> With Best Regards, >>>>>>>>>>>> Andy Shevchenko >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >> >> >
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Seems more that four months after the report[1] there is still no resolution for this regression in sight. This is not how it should be. Greg, could you maybe chime in here and share your opinion? From my "I do not know anything about the details of this problem" point of view it looks like reverting the culprit (f49449fbc21e7e ("usb: gadget: u_ether: Replace netif_stop_queue with netif_device_detach")) might be the best option for now, unless of course that creates a regression as well -- but you are the best to judge here. [1] https://lore.kernel.org/all/ZaQS5x-XK08Jre6I@smile.fi.intel.com/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. On 26.05.24 22:52, Ferry Toth wrote: > Op 15-05-2024 om 20:38 schreef Ferry Toth: >> >> Sorry to have kept you waiting, I was away for a short break. >> >> Patch tested below. >> >> Op 10-05-2024 om 11:45 schreef Hardik Gajjar: >>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: >>>> Oops, sorry, wrong file attached . Now correct one. >>>> >>>> Op 02-05-2024 om 22:13 schreef Ferry Toth: >>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar: >>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko >>>>>>>>>>>> wrote: >>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 04:26:37PM +0200, Hardik Gajjar >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at >>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy >>>>>>>>>>>>>>>> Shevchenko wrote: >>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>>>>>>>>> >>>>>>>>>>>>> ... >>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>>>>>> target >>>>>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>>> longer exists >>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>>>>>> symlink should become active again >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is >>>>>>>>>>>>>> activated): >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then switching to host mode: >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>>> ls: cannot access >>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>>>>>> No such file >>>>>>>>>>>>>> or directory >>>>>>>>>>>>>> >>>>>>>>>>>>>> Then back to device mode: >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>>>>>> >>>>>>>>>>>>>> net is missing. But, network functions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mass storage device is created and removed each time as >>>>>>>>>>>>>> expected. >>>>>>>>>>>>> >>>>>>>>>>>>> So, what's the conclusion? Shall we move towards revert of >>>>>>>>>>>>> those >>>>>>>>>>>>> two changes? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> As promised, I have the tested the this patch with the dwc3 >>>>>>>>>>>> gadget. >>>>>>>>>>>> I could not reproduce >>>>>>>>>>>> the issue. >>>>>>>>>>>> >>>>>>>>>>>> I can see the usb0 exist all the time and accessible >>>>>>>>>>>> regardless of >>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>>>>>> >>>>>>>>>>>> Following are the logs: >>>>>>>>>>>> //Host to device >>>>>>>>>>>> >>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # echo >>>>>>>>>>>> "peripheral" >>>>>>>>>>>>> mode >>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>>> usb0 >>>>>>>>>>>> >>>>>>>>>>>> //device to host >>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb >>>>>>>>>>>> # echo "host" > mode >>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>>> usb0 >>>>>>>>>>> >>>>>>>>>>> That is weird. When I switch to host mode (using >>>>>>>>>>> the physical switch), >>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>>>>>> >>>>>>>>>>> Switching back to device mode, that gadget >>>>>>>>>>> directory is recreated. And >>>>>>>>>>> gadget/sound as well, but not gadget/net. >>>>>>>>>>> >>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>>>>>> < >>>>>>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>>>>>> addr_len flags phys_switch_id >>>>>>>>>>>> address gro_flush_timeout power >>>>>>>>>>>> broadcast ifalias proto_down >>>>>>>>>>>> carrier ifindex queues >>>>>>>>>>>> carrier_changes iflink speed >>>>>>>>>>>> carrier_down_count link_mode statistics >>>>>>>>>>>> carrier_up_count mtu subsystem >>>>>>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>>>>>> dev_port netdev_group type >>>>>>>>>>>> device operstate uevent >>>>>>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ifconfig >>>>>>>>>>>> -a usb0 >>>>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 >>>>>>>>>>>> frame:0 >>>>>>>>>>>> TX packets:0 errors:0 dropped:0 overruns:0 >>>>>>>>>>>> carrier:0 >>>>>>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>>>>>> >>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>>>>>> >>>>>>>>>>>> I strongly advise against reverting the patch solely based >>>>>>>>>>>> on the >>>>>>>>>>>> observed issue of removing the /sys/class/net/usb0 directory >>>>>>>>>>>> while >>>>>>>>>>>> the usb0 interface remains available. >>>>>>>>>>> >>>>>>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>>>>>> >>>>>>>>>>> I have while in host mode: >>>>>>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>>>>>> usb0: flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> >>>>>>>>>>> mtu 1500 >>>>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>>>>>> 10.42.0.255 >>>>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>>>>>> scopeid 0x20<link> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> You don't see that because you didn't create a connection at >>>>>>>>>>> all. >>>>>>>>>>> >>>>>>>>>>>> Instead, I recommend enabling FTRACE to >>>>>>>>>>>> trace the functions involved >>>>>>>>>>>> and identify which faulty call is responsible for removing >>>>>>>>>>>> usb0. >>>>>>>>>>> >>>>>>>>>>> Switching from device -> host -> device: >>>>>>>>>>> >>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>>>>>> plugin 'function_graph' >>>>>>>>>>> Hit Ctrl^C to stop recording >>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>>>>>> 0 bytes in size (0 uncompressed) >>>>>>>>>>> root@yuna:~# trace-cmd report >>>>>>>>>>> cpus=2 >>>>>>>>>>> irq/68-dwc3-725 [000] 514.575337: >>>>>>>>>>> funcgraph_entry: # >>>>>>>>>>> 2079.480 us | gether_disconnect(); >>>>>>>>>>> irq/68-dwc3-946 [000] 524.263731: >>>>>>>>>>> funcgraph_entry: + >>>>>>>>>>> 11.640 us | gether_disconnect(); >>>>>>>>>>> irq/68-dwc3-946 [000] 524.263743: >>>>>>>>>>> funcgraph_entry: ! >>>>>>>>>>> 116.520 us | gether_connect(); >>>>>>>>>>> irq/68-dwc3-946 [000] 524.268029: >>>>>>>>>>> funcgraph_entry: # >>>>>>>>>>> 2057.260 us | gether_disconnect(); >>>>>>>>>>> irq/68-dwc3-946 [000] 524.270089: >>>>>>>>>>> funcgraph_entry: ! >>>>>>>>>>> 109.000 us | gether_connect(); >>>>>>>>>> >>>>>>>>>> I tried to get a more useful trace: >>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > >>>>>>>>>> set_ftrace_filter >>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >>>>>>>>>>>> set_ftrace_filter >>>>>>>>>> -> switch to host mode then back to device >>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>>>>>> # tracer: function >>>>>>>>>> # >>>>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>>>>>> # >>>>>>>>>> # _-----=> irqs-off/BH-disabled >>>>>>>>>> # / _----=> need-resched >>>>>>>>>> # | / _---=> hardirq/softirq >>>>>>>>>> # || / _--=> preempt-depth >>>>>>>>>> # ||| / _-=> migrate-disable >>>>>>>>>> # |||| / delay >>>>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>>>>>> # | | | ||||| | | >>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: >>>>>>>>>> reset_config >>>>>>>>>> <-__composite_disconnect >>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>>>>>> <-reset_config >>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: >>>>>>>>>> gether_disconnect >>>>>>>>>> <-reset_config >>>>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>>>>>> <-purge_configs_funcs >>>>>>>>>> >>>>>>>>>> -> to device mode >>>>>>>>>> >>>>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>>>>>> <-usb_add_function >>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>>>>>> <-composite_setup >>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: >>>>>>>>>> gether_disconnect >>>>>>>>>> <-eem_set_alt >>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155220: >>>>>>>>>> gether_connect >>>>>>>>>> <-eem_set_alt >>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>>>>>> <-composite_setup >>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: >>>>>>>>>> gether_disconnect >>>>>>>>>> <-eem_set_alt >>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.159338: >>>>>>>>>> gether_connect >>>>>>>>>> <-eem_set_alt >>>>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>>>>>> <-rx_complete >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> I don't know where to look exactly. Any hints? >>>>>>>>> >>>>>>>>> do you see anything related to gether_cleanup() after >>>>>>>>> eem_unbind() ? >>>>>>>> >>>>>>>> Nope. It's a pitty that the trace formatting got messed up >>>>>>>> above. But as >>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>>>>>> function is called, until I flip the switch to device mode. >>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap >>>>>>>> <-rx_complete >>>>>>>> and eem_wrap <-eth_start_xmit lines. >>>>>>>> >>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>>>>>> network side to check who deleting the sysfs entry >>>>>>>> >>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see >>>>>>>> how >>>>>>>> that would be related to the patch we're discussing here. I was >>>>>>>> hoping >>>>>>>> for a little more targeted hint. >>>>>>> >>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', >>>>>>> 'usb_fun*', >>>>>>> 'reset_config' and 'device_remove_file' leads me to: >>>>>>> >>>>>>> TIMESTAMP FUNCTION >>>>>>> | | >>>>>>> 49.952477: eem_wrap <-eth_start_xmit >>>>>>> 55.072455: eem_wrap <-eth_start_xmit >>>>>>> 55.072621: eem_unwrap <-rx_complete >>>>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>>>>>> 59.011545: composite_reset <-configfs_composite_reset >>>>>>> 59.011548: reset_config <-__composite_disconnect >>>>>>> 59.013565: eem_disable <-reset_config >>>>>>> 59.013567: gether_disconnect <-reset_config >>>>>>> 59.049560: device_remove_file <-device_remove >>>>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>>>>>> 59.051189: composite_disconnect <-configfs_composite_discon >>>>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>>>>>> 59.052519: eem_unbind <-purge_configs_funcs >>>>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>>>>>> 59.052537: device_remove_file <-composite_dev_cleanup >>>>>>> >>>>>>> device_remove_file gets called twice, once by device_remove after >>>>>>> gether_disconnect (that the one). The 2nd time by >>>>>>> composite_dev_cleanup >>>>>>> (removing the gadget) >>>>>> >>>>>> I believe that the device_remove_file function is only removing >>>>>> suspend-specific attributes, not the complete gadget. >>>>>> Typically, when you perform the role switch, the Gadget start/stop >>>>>> function in your UDC driver is called. These functions should not >>>>>> delete the gadget >>>>>> >>>>>> To investigate further, could you please enable the DWC3 functions >>>>>> in ftrace and check who is removing the gadget? >>>>>> I can also enable this on my system and compare the logs with yours, >>>>>> but I will be in PI planning for 1.5 weeks and may not be able to >>>>>> provide immediate support. >>>>> >>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so >>>>> trace is 600kb. I cut uninteresting stuff before >>>>> configfs_composite_reset <-usb_gadget_udc_reset and after >>>>> __dwc3_set_mode, >>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> >>>>> compare. >>>>> >>> Could you please try with the following patch and see if your issue >>> resolves ? >>> >>> diff --git a/drivers/usb/gadget/function/f_eem.c >>> b/drivers/usb/gadget/function/f_eem.c >>> index 3b445bd88498..c2a904475083 100644 >>> --- a/drivers/usb/gadget/function/f_eem.c >>> +++ b/drivers/usb/gadget/function/f_eem.c >>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, >>> struct usb_function *f) >>> struct usb_composite_dev *cdev = c->cdev; >>> struct f_eem *eem = func_to_eem(f); >>> struct usb_string *us; >>> - int status; >>> + int status = 0; >>> struct usb_ep *ep; >>> struct f_eem_opts *eem_opts; >>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration >>> *c, struct usb_function *f) >>> * with list_for_each_entry, so we assume no race condition >>> * with regard to eem_opts->bound access >>> */ >>> + mutex_lock(&eem_opts->lock); >>> + gether_set_gadget(eem_opts->net, cdev->gadget); >>> + >>> if (!eem_opts->bound) { >>> - mutex_lock(&eem_opts->lock); >>> - gether_set_gadget(eem_opts->net, cdev->gadget); >>> status = gether_register_netdev(eem_opts->net); >>> - mutex_unlock(&eem_opts->lock); >>> - if (status) >>> - return status; >>> - eem_opts->bound = true; >>> } >>> + mutex_unlock(&eem_opts->lock); >>> + >>> + if (status) >>> + goto fail; >>> + >>> + eem_opts->bound = true; >>> + >>> us = usb_gstrings_attach(cdev, eem_strings, >>> ARRAY_SIZE(eem_string_defs)); >>> >> >> After switching from host -> device -> host the usb0 device as seen by >> ifconfig remains "RUNNING" and in the routing table. >> >> FYI This is the regression. > > For us, I don't see any advantages in having this patch and only the > downside. Do you have any further ideas / patches that we could test. > > Otherwise, it might be best to go with the original suggestion and > revert until it gets sorted out? > >> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is >> still missing. >> >> After deeper diving into v6.1.0, I found the latter a longer existing >> bug, not caused by your patch. >> >> More over, this doesn't seem to do any harm. >> >> The first issue does, as we try to route traffic over a device that no >> longer exists. >> >>>>>> Additionally, please check if you have any customized DWC patches >>>>>> that may be causing this problem. >>>>>> >>>>>>> >>>>>>>> You may recall the whole issue did not occur before this >>>>>>>> patch got applied. >>>>>>>> >>>>>>>>> Hardik >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> According to current kernel architecture of u_ether driver, >>>>>>>>>>>> only >>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>>>>>> kobject and sysfs interface. >>>>>>>>>>>> I suggest sharing the analysis here to understand why this >>>>>>>>>>>> practice >>>>>>>>>>>> is not followed in your use case or driver ? >>>>>>>>>>> >>>>>>>>>>> Yes, I'll try to trace where that happens. >>>>>>>>>>> >>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>>>>>> harmless? But the usb: net device remaining after disconnect >>>>>>>>>>> or role >>>>>>>>>>> switch is not good, as the route remains. >>>>>>>>>>> >>>>>>>>>>> May be they are 2 separate problems. Could you try to >>>>>>>>>>> reproduce what >>>>>>>>>>> happens if you make eem connection and then unplug? >>>>>>>>>>> >>>>>>>>>>>> I am curious why the driver was developed without adhering >>>>>>>>>>>> to the >>>>>>>>>>>> kernel's gadget architecture. >>>>>>>>>> >>>>>>>>>> I don't know what you mean here. Which driver do you mean? >>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>>>>>> with this patch and >>>>>>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>>>>>> controller as well. But >>>>>>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>>>>>> We are happy to test any fixes if you come up with. >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> With Best Regards, >>>>>>>>>>>>> Andy Shevchenko
On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote: > Hi > > Op 15-05-2024 om 20:38 schreef Ferry Toth: > > Hi, > > > > Sorry to have kept you waiting, I was away for a short break. > > > > Patch tested below. > > > > Op 10-05-2024 om 11:45 schreef Hardik Gajjar: > > > On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: > > > > Oops, sorry, wrong file attached . Now correct one. > > > > > > > > Op 02-05-2024 om 22:13 schreef Ferry Toth: > > > > > Op 02-05-2024 om 17:29 schreef Hardik Gajjar: > > > > > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > > > > > > > Hi, > > > > > > > > > > > > > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > > > > > > > Hi, > > > > > > > > > > > > > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > > > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 > > > > > > > > > > > > > > > at 04:26:37PM +0200, > > > > > > > > > > > > > > > Hardik Gajjar wrote: > > > > > > > > > > > > > > > > On Wed, Apr 10, 2024 at > > > > > > > > > > > > > > > > 08:37:42PM +0300, Andy > > > > > > > > > > > > > > > > Shevchenko wrote: > > > > > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > > > > > > > no > > > > > > > > > > > > > > > > > > longer exists > > > > > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > > > > > > > present, but the symlink is > > > > > > > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > > > > > > > recreate the device, and the > > > > > > > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > > > > ls: cannot access > > > > > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > > > > > > > No such file > > > > > > > > > > > > > > or directory > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mass storage device is > > > > > > > > > > > > > > created and removed each > > > > > > > > > > > > > > time as expected. > > > > > > > > > > > > > > > > > > > > > > > > > > So, what's the conclusion? > > > > > > > > > > > > > Shall we move towards revert > > > > > > > > > > > > > of those > > > > > > > > > > > > > two changes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As promised, I have the tested > > > > > > > > > > > > the this patch with the dwc3 > > > > > > > > > > > > gadget. > > > > > > > > > > > > I could not reproduce > > > > > > > > > > > > the issue. > > > > > > > > > > > > > > > > > > > > > > > > I can see the usb0 exist all the > > > > > > > > > > > > time and accessible regardless > > > > > > > > > > > > of > > > > > > > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > > > > > > > > > > > > > Following are the logs: > > > > > > > > > > > > //Host to device > > > > > > > > > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > > > > # echo "peripheral" > > > > > > > > > > > > > mode > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > > > > > usb0 > > > > > > > > > > > > > > > > > > > > > > > > //device to host > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > > > > # echo "host" > mode > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > > > > > usb0 > > > > > > > > > > > > > > > > > > > > > > That is weird. When I switch to host mode (using > > > > > > > > > > > the physical switch), > > > > > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > > > > > > > > > > > > > Switching back to device mode, that gadget > > > > > > > > > > > directory is recreated. And > > > > > > > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > > > > > > > < > > > > > > > > > > > > addr_assign_type duplex phys_port_name > > > > > > > > > > > > addr_len flags phys_switch_id > > > > > > > > > > > > address gro_flush_timeout power > > > > > > > > > > > > broadcast ifalias proto_down > > > > > > > > > > > > carrier ifindex queues > > > > > > > > > > > > carrier_changes iflink speed > > > > > > > > > > > > carrier_down_count link_mode statistics > > > > > > > > > > > > carrier_up_count mtu subsystem > > > > > > > > > > > > dev_id name_assign_type tx_queue_len > > > > > > > > > > > > dev_port netdev_group type > > > > > > > > > > > > device operstate uevent > > > > > > > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > > > > # ifconfig -a usb0 > > > > > > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > > > > > > > TX packets:0 > > > > > > > > > > > > errors:0 dropped:0 overruns:0 > > > > > > > > > > > > carrier:0 > > > > > > > > > > > > collisions:0 txqueuelen:1000 > > > > > > > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > > > > > > > > > > > > > I strongly advise against > > > > > > > > > > > > reverting the patch solely based > > > > > > > > > > > > on the > > > > > > > > > > > > observed issue of removing the > > > > > > > > > > > > /sys/class/net/usb0 directory > > > > > > > > > > > > while > > > > > > > > > > > > the usb0 interface remains available. > > > > > > > > > > > > > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > > > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > > > > > > > > > > > > > I have while in host mode: > > > > > > > > > > > root@yuna:~# ifconfig -a usb0 > > > > > > > > > > > usb0: > > > > > > > > > > > flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> > > > > > > > > > > > mtu 1500 > > > > > > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > > > > > > > 10.42.0.255 > > > > > > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > > > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > > > > > > > > > > > > > Instead, I recommend enabling FTRACE to > > > > > > > > > > > > trace the functions involved > > > > > > > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > > > > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > > > > > > > plugin 'function_graph' > > > > > > > > > > > Hit Ctrl^C to stop recording > > > > > > > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > > > > > > > 188 bytes in size (4096 uncompressed) > > > > > > > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > > > > > > > 0 bytes in size (0 uncompressed) > > > > > > > > > > > root@yuna:~# trace-cmd report > > > > > > > > > > > cpus=2 > > > > > > > > > > > irq/68-dwc3-725 [000] 514.575337: > > > > > > > > > > > funcgraph_entry: # > > > > > > > > > > > 2079.480 us | gether_disconnect(); > > > > > > > > > > > irq/68-dwc3-946 [000] 524.263731: > > > > > > > > > > > funcgraph_entry: + > > > > > > > > > > > 11.640 us | gether_disconnect(); > > > > > > > > > > > irq/68-dwc3-946 [000] 524.263743: > > > > > > > > > > > funcgraph_entry: ! > > > > > > > > > > > 116.520 us | gether_connect(); > > > > > > > > > > > irq/68-dwc3-946 [000] 524.268029: > > > > > > > > > > > funcgraph_entry: # > > > > > > > > > > > 2057.260 us | gether_disconnect(); > > > > > > > > > > > irq/68-dwc3-946 [000] 524.270089: > > > > > > > > > > > funcgraph_entry: ! > > > > > > > > > > > 109.000 us | gether_connect(); > > > > > > > > > > > > > > > > > > > > I tried to get a more useful trace: > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' > > > > > > > > > > > > set_ftrace_filter > > > > > > > > > > -> switch to host mode then back to device > > > > > > > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > > > > > > > # tracer: function > > > > > > > > > > # > > > > > > > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > > > > > > > # > > > > > > > > > > # _-----=> irqs-off/BH-disabled > > > > > > > > > > # / _----=> need-resched > > > > > > > > > > # | / _---=> hardirq/softirq > > > > > > > > > > # || / _--=> preempt-depth > > > > > > > > > > # ||| / _-=> migrate-disable > > > > > > > > > > # |||| / delay > > > > > > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > > > > > > > # | | | ||||| | | > > > > > > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > > > > > > > <-__composite_disconnect > > > > > > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > > > > > > > <-reset_config > > > > > > > > > > irq/68-dwc3-523 [000] D..3. 133.992276: > > > > > > > > > > gether_disconnect > > > > > > > > > > <-reset_config > > > > > > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > > > > > > > <-purge_configs_funcs > > > > > > > > > > > > > > > > > > > > -> to device mode > > > > > > > > > > > > > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > > > > > > > <-usb_add_function > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > > > > > > > <-composite_setup > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155215: > > > > > > > > > > gether_disconnect > > > > > > > > > > <-eem_set_alt > > > > > > > > > > irq/68-dwc3-734 [000] > > > > > > > > > > D..3. 149.155220: gether_connect > > > > > > > > > > <-eem_set_alt > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > > > > > > > <-composite_setup > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.157292: > > > > > > > > > > gether_disconnect > > > > > > > > > > <-eem_set_alt > > > > > > > > > > irq/68-dwc3-734 [000] > > > > > > > > > > D..3. 149.159338: gether_connect > > > > > > > > > > <-eem_set_alt > > > > > > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > > > > > > > <-rx_complete > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > > > > > > > > > > > > > do you see anything related to > > > > > > > > > gether_cleanup() after eem_unbind() ? > > > > > > > > > > > > > > > > Nope. It's a pitty that the trace formatting got messed up > > > > > > > > above. But as > > > > > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > > > > > > > function is called, until I flip the switch to device mode. > > > > > > > > The ... at the end is where I cut uninteresting eem_unwrap > > > > > > > > <-rx_complete > > > > > > > > and eem_wrap <-eth_start_xmit lines. > > > > > > > > > > > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > > > > > > > network side to check who deleting the sysfs entry > > > > > > > > > > > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > > > > > > > that would be related to the patch we're > > > > > > > > discussing here. I was hoping > > > > > > > > for a little more targeted hint. > > > > > > > > > > > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', > > > > > > > 'usb_fun*', > > > > > > > 'reset_config' and 'device_remove_file' leads me to: > > > > > > > > > > > > > > TIMESTAMP FUNCTION > > > > > > > | | > > > > > > > 49.952477: eem_wrap <-eth_start_xmit > > > > > > > 55.072455: eem_wrap <-eth_start_xmit > > > > > > > 55.072621: eem_unwrap <-rx_complete > > > > > > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > > > > > > > 59.011545: composite_reset <-configfs_composite_reset > > > > > > > 59.011548: reset_config <-__composite_disconnect > > > > > > > 59.013565: eem_disable <-reset_config > > > > > > > 59.013567: gether_disconnect <-reset_config > > > > > > > 59.049560: device_remove_file <-device_remove > > > > > > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > > > > > > > 59.051189: composite_disconnect <-configfs_composite_discon > > > > > > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > > > > > > > 59.052519: eem_unbind <-purge_configs_funcs > > > > > > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > > > > > > > 59.052537: device_remove_file <-composite_dev_cleanup > > > > > > > > > > > > > > device_remove_file gets called twice, once by device_remove after > > > > > > > gether_disconnect (that the one). The 2nd time by > > > > > > > composite_dev_cleanup > > > > > > > (removing the gadget) > > > > > > > > > > > > I believe that the device_remove_file function is only removing > > > > > > suspend-specific attributes, not the complete gadget. > > > > > > Typically, when you perform the role switch, the Gadget start/stop > > > > > > function in your UDC driver is called. These functions should not > > > > > > delete the gadget > > > > > > > > > > > > To investigate further, could you please enable the DWC3 functions > > > > > > in ftrace and check who is removing the gadget? > > > > > > I can also enable this on my system and compare the logs with yours, > > > > > > but I will be in PI planning for 1.5 weeks and may not be able to > > > > > > provide immediate support. > > > > > > > > > > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so > > > > > trace is 600kb. I cut uninteresting stuff before > > > > > configfs_composite_reset <-usb_gadget_udc_reset and after > > > > > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> > > > > > compare. > > > > > > > > Could you please try with the following patch and see if your issue > > > resolves ? > > > > > > diff --git a/drivers/usb/gadget/function/f_eem.c > > > b/drivers/usb/gadget/function/f_eem.c > > > index 3b445bd88498..c2a904475083 100644 > > > --- a/drivers/usb/gadget/function/f_eem.c > > > +++ b/drivers/usb/gadget/function/f_eem.c > > > @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, > > > struct usb_function *f) > > > struct usb_composite_dev *cdev = c->cdev; > > > struct f_eem *eem = func_to_eem(f); > > > struct usb_string *us; > > > - int status; > > > + int status = 0; > > > struct usb_ep *ep; > > > struct f_eem_opts *eem_opts; > > > @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration > > > *c, struct usb_function *f) > > > * with list_for_each_entry, so we assume no race condition > > > * with regard to eem_opts->bound access > > > */ > > > + mutex_lock(&eem_opts->lock); > > > + gether_set_gadget(eem_opts->net, cdev->gadget); > > > + > > > if (!eem_opts->bound) { > > > - mutex_lock(&eem_opts->lock); > > > - gether_set_gadget(eem_opts->net, cdev->gadget); > > > status = gether_register_netdev(eem_opts->net); > > > - mutex_unlock(&eem_opts->lock); > > > - if (status) > > > - return status; > > > - eem_opts->bound = true; > > > } > > > + mutex_unlock(&eem_opts->lock); > > > + > > > + if (status) > > > + goto fail; > > > + > > > + eem_opts->bound = true; > > > + > > > us = usb_gstrings_attach(cdev, eem_strings, > > > ARRAY_SIZE(eem_string_defs)); > > > > > > > After switching from host -> device -> host the usb0 device as seen by > > ifconfig remains "RUNNING" and in the routing table. > > > > FYI This is the regression. > > For us, I don't see any advantages in having this patch and only the > downside. Do you have any further ideas / patches that we could test. > > Otherwise, it might be best to go with the original suggestion and revert > until it gets sorted out? Hello, Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ? However, We are very much intrested to fix your problem and upstream the v2. Therefore could you please share the following details to reproduce the issue - How did you created the gadget ? Could you please share the exact commands to create the eem gadget ? - Arch of your test bench (arm, x86) - How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ? Hardik > > > Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is > > still missing. > > > > After deeper diving into v6.1.0, I found the latter a longer existing > > bug, not caused by your patch. > > > > More over, this doesn't seem to do any harm. > > > > The first issue does, as we try to route traffic over a device that no > > longer exists. > > > > > > > > Additionally, please check if you have any customized DWC patches > > > > > > that may be causing this problem. > > > > > > > > > > > > > > > > > > > > > You may recall the whole issue did not occur before this > > > > > > > > patch got applied. > > > > > > > > > > > > > > > > > Hardik > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > > > > > > > kobject and sysfs interface. > > > > > > > > > > > > I suggest sharing the analysis > > > > > > > > > > > > here to understand why this > > > > > > > > > > > > practice > > > > > > > > > > > > is not followed in your use case or driver ? > > > > > > > > > > > > > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > > > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > > > > > > > harmless? But the usb: net device > > > > > > > > > > > remaining after disconnect or role > > > > > > > > > > > switch is not good, as the route remains. > > > > > > > > > > > > > > > > > > > > > > May be they are 2 separate problems. > > > > > > > > > > > Could you try to reproduce what > > > > > > > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > > > > > > > > > > > > > I am curious why the driver was > > > > > > > > > > > > developed without adhering to > > > > > > > > > > > > the > > > > > > > > > > > > kernel's gadget architecture. > > > > > > > > > > > > > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > > > > > > > with this patch and > > > > > > > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > > > > > > > Would have been nice if someone could test on other > > > > > > > > > > > > > > controller as well. But > > > > > > > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > > > > > > > We are happy to test any fixes if you come up with. > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > With Best Regards, > > > > > > > > > > > > > Andy Shevchenko > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
On Tue, May 28, 2024 at 09:18:51AM +0200, Hardik Gajjar wrote: > On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote: > > Op 15-05-2024 om 20:38 schreef Ferry Toth: > > > Op 10-05-2024 om 11:45 schreef Hardik Gajjar: > > > > On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: > > > > > Op 02-05-2024 om 22:13 schreef Ferry Toth: > > > > > > Op 02-05-2024 om 17:29 schreef Hardik Gajjar: > > > > > > > On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: > > > > > > > > Op 30-04-2024 om 21:40 schreef Ferry Toth: > > > > > > > > > Op 30-04-2024 om 17:32 schreef Hardik Gajjar: > > > > > > > > > > On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: > > > > > > > > > > > Op 25-04-2024 om 23:27 schreef Ferry Toth: > > > > > > > > > > > > Op 17-04-2024 om 17:13 schreef Hardik Gajjar: > > > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > > > Op 11-04-2024 om 18:39 schreef Andy Shevchenko: > > > > > > > > > > > > > > > > On Thu, Apr 11, 2024 > > > > > > > > > > > > > > > > at 04:26:37PM +0200, > > > > > > > > > > > > > > > > Hardik Gajjar wrote: > > > > > > > > > > > > > > > > > On Wed, Apr 10, 2024 at > > > > > > > > > > > > > > > > > 08:37:42PM +0300, Andy > > > > > > > > > > > > > > > > > Shevchenko wrote: > > > > > > > > > > > > > > > > > > On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: > > > > > > > > > > > > > > > > > > > Op 05-04-2024 om 13:38 schreef Hardik Gajjar: > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > Exactly. And this didn't happen before the 2 patches. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To be precise: /sys/class/net/usb0 is not > > > > > > > > > > > > > > > > > > > removed and it is a link, the link > > > > > > > > > > > > > > > > > > > target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 > > > > > > > > > > > > > > > > > > > no > > > > > > > > > > > > > > > > > > > longer exists > > > > > > > > > > > > > > > > > So, it means that the /sys/class/net/usb0 is > > > > > > > > > > > > > > > > > present, but the symlink is > > > > > > > > > > > > > > > > > broken. In that case, the dwc3 driver should > > > > > > > > > > > > > > > > > recreate the device, and the > > > > > > > > > > > > > > > > > symlink should become active again > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, on first enabling gadget (when device mode is activated): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > > > > > driver net power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then switching to host mode: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > > > > > ls: cannot access > > > > > > > > > > > > > > > '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': > > > > > > > > > > > > > > > No such file > > > > > > > > > > > > > > > or directory > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then back to device mode: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ls > > > > > > > > > > > > > > > /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ > > > > > > > > > > > > > > > driver power sound subsystem suspended uevent > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > net is missing. But, network functions: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# ping 10.42.0.1 > > > > > > > > > > > > > > > PING 10.42.0.1 (10.42.0.1): 56 data bytes > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Mass storage device is > > > > > > > > > > > > > > > created and removed each > > > > > > > > > > > > > > > time as expected. > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, what's the conclusion? > > > > > > > > > > > > > > Shall we move towards revert > > > > > > > > > > > > > > of those > > > > > > > > > > > > > > two changes? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As promised, I have the tested > > > > > > > > > > > > > the this patch with the dwc3 > > > > > > > > > > > > > gadget. > > > > > > > > > > > > > I could not reproduce > > > > > > > > > > > > > the issue. > > > > > > > > > > > > > > > > > > > > > > > > > > I can see the usb0 exist all the > > > > > > > > > > > > > time and accessible regardless > > > > > > > > > > > > > of > > > > > > > > > > > > > the role switching of the USB mode (peripheral <-> host) > > > > > > > > > > > > > > > > > > > > > > > > > > Following are the logs: > > > > > > > > > > > > > //Host to device > > > > > > > > > > > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > > > > > # echo "peripheral" > > > > > > > > > > > > > > mode > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > > > > > > usb0 > > > > > > > > > > > > > > > > > > > > > > > > > > //device to host > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > > > > > # echo "host" > mode > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # ls > > > > > > > > > > > > > a800000.dwc3/gadget/net/ > > > > > > > > > > > > > usb0 > > > > > > > > > > > > > > > > > > > > > > > > That is weird. When I switch to host mode (using > > > > > > > > > > > > the physical switch), > > > > > > > > > > > > the whole gadget directory is removed (now testing 6.9.0-rc5) > > > > > > > > > > > > > > > > > > > > > > > > Switching back to device mode, that gadget > > > > > > > > > > > > directory is recreated. And > > > > > > > > > > > > gadget/sound as well, but not gadget/net. > > > > > > > > > > > > > > > > > > > > > > > > > s a800000.dwc3/gadget/net/usb0 > > > > > > > > > > > > > < > > > > > > > > > > > > > addr_assign_type duplex phys_port_name > > > > > > > > > > > > > addr_len flags phys_switch_id > > > > > > > > > > > > > address gro_flush_timeout power > > > > > > > > > > > > > broadcast ifalias proto_down > > > > > > > > > > > > > carrier ifindex queues > > > > > > > > > > > > > carrier_changes iflink speed > > > > > > > > > > > > > carrier_down_count link_mode statistics > > > > > > > > > > > > > carrier_up_count mtu subsystem > > > > > > > > > > > > > dev_id name_assign_type tx_queue_len > > > > > > > > > > > > > dev_port netdev_group type > > > > > > > > > > > > > device operstate uevent > > > > > > > > > > > > > dormant phys_port_id waiting_for_supplier > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb > > > > > > > > > > > > > # ifconfig -a usb0 > > > > > > > > > > > > > usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a > > > > > > > > > > > > > BROADCAST MULTICAST MTU:1500 Metric:1 > > > > > > > > > > > > > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > > > > > > > > > > > > > TX packets:0 > > > > > > > > > > > > > errors:0 dropped:0 overruns:0 > > > > > > > > > > > > > carrier:0 > > > > > > > > > > > > > collisions:0 txqueuelen:1000 > > > > > > > > > > > > > RX bytes:0 TX bytes:0 > > > > > > > > > > > > > > > > > > > > > > > > > > console:/sys/bus/platform/devices/a800000.ssusb # > > > > > > > > > > > > > > > > > > > > > > > > > > I strongly advise against > > > > > > > > > > > > > reverting the patch solely based > > > > > > > > > > > > > on the > > > > > > > > > > > > > observed issue of removing the > > > > > > > > > > > > > /sys/class/net/usb0 directory > > > > > > > > > > > > > while > > > > > > > > > > > > > the usb0 interface remains available. > > > > > > > > > > > > > > > > > > > > > > > > There's more to it. I also mentioned that switching the role or > > > > > > > > > > > > unplugging the cable leaves the usb0 connection. > > > > > > > > > > > > > > > > > > > > > > > > I have while in host mode: > > > > > > > > > > > > root@yuna:~# ifconfig -a usb0 > > > > > > > > > > > > usb0: > > > > > > > > > > > > flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> > > > > > > > > > > > > mtu 1500 > > > > > > > > > > > > inet 10.42.0.221 netmask 255.255.255.0 broadcast > > > > > > > > > > > > 10.42.0.255 > > > > > > > > > > > > inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 > > > > > > > > > > > > scopeid 0x20<link> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You don't see that because you didn't create a connection at all. > > > > > > > > > > > > > > > > > > > > > > > > > Instead, I recommend enabling FTRACE to > > > > > > > > > > > > > trace the functions involved > > > > > > > > > > > > > and identify which faulty call is responsible for removing usb0. > > > > > > > > > > > > > > > > > > > > > > > > Switching from device -> host -> device: > > > > > > > > > > > > > > > > > > > > > > > > root@yuna:~# trace-cmd record -p function_graph -l *gether_* > > > > > > > > > > > > plugin 'function_graph' > > > > > > > > > > > > Hit Ctrl^C to stop recording > > > > > > > > > > > > ^CCPU0 data recorded at offset=0x1c8000 > > > > > > > > > > > > 188 bytes in size (4096 uncompressed) > > > > > > > > > > > > CPU1 data recorded at offset=0x1c9000 > > > > > > > > > > > > 0 bytes in size (0 uncompressed) > > > > > > > > > > > > root@yuna:~# trace-cmd report > > > > > > > > > > > > cpus=2 > > > > > > > > > > > > irq/68-dwc3-725 [000] 514.575337: > > > > > > > > > > > > funcgraph_entry: # > > > > > > > > > > > > 2079.480 us | gether_disconnect(); > > > > > > > > > > > > irq/68-dwc3-946 [000] 524.263731: > > > > > > > > > > > > funcgraph_entry: + > > > > > > > > > > > > 11.640 us | gether_disconnect(); > > > > > > > > > > > > irq/68-dwc3-946 [000] 524.263743: > > > > > > > > > > > > funcgraph_entry: ! > > > > > > > > > > > > 116.520 us | gether_connect(); > > > > > > > > > > > > irq/68-dwc3-946 [000] 524.268029: > > > > > > > > > > > > funcgraph_entry: # > > > > > > > > > > > > 2057.260 us | gether_disconnect(); > > > > > > > > > > > > irq/68-dwc3-946 [000] 524.270089: > > > > > > > > > > > > funcgraph_entry: ! > > > > > > > > > > > > 109.000 us | gether_connect(); > > > > > > > > > > > > > > > > > > > > > > I tried to get a more useful trace: > > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter > > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter > > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo function > current_tracer > > > > > > > > > > > root@yuna:/sys/kernel/tracing# echo 'reset_config' > > > > > > > > > > > > > set_ftrace_filter > > > > > > > > > > > -> switch to host mode then back to device > > > > > > > > > > > root@yuna:/sys/kernel/tracing# cat trace > > > > > > > > > > > # tracer: function > > > > > > > > > > > # > > > > > > > > > > > # entries-in-buffer/entries-written: 53/53 #P:2 > > > > > > > > > > > # > > > > > > > > > > > # _-----=> irqs-off/BH-disabled > > > > > > > > > > > # / _----=> need-resched > > > > > > > > > > > # | / _---=> hardirq/softirq > > > > > > > > > > > # || / _--=> preempt-depth > > > > > > > > > > > # ||| / _-=> migrate-disable > > > > > > > > > > > # |||| / delay > > > > > > > > > > > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION > > > > > > > > > > > # | | | ||||| | | > > > > > > > > > > > irq/68-dwc3-523 [000] D..3. 133.990254: reset_config > > > > > > > > > > > <-__composite_disconnect > > > > > > > > > > > irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable > > > > > > > > > > > <-reset_config > > > > > > > > > > > irq/68-dwc3-523 [000] D..3. 133.992276: > > > > > > > > > > > gether_disconnect > > > > > > > > > > > <-reset_config > > > > > > > > > > > kworker/1:3-443 [001] ...1. 134.022453: eem_unbind > > > > > > > > > > > <-purge_configs_funcs > > > > > > > > > > > > > > > > > > > > > > -> to device mode > > > > > > > > > > > > > > > > > > > > > > kworker/1:3-443 [001] ...1. 148.630773: eem_bind > > > > > > > > > > > <-usb_add_function > > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt > > > > > > > > > > > <-composite_setup > > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.155215: > > > > > > > > > > > gether_disconnect > > > > > > > > > > > <-eem_set_alt > > > > > > > > > > > irq/68-dwc3-734 [000] > > > > > > > > > > > D..3. 149.155220: gether_connect > > > > > > > > > > > <-eem_set_alt > > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt > > > > > > > > > > > <-composite_setup > > > > > > > > > > > irq/68-dwc3-734 [000] D..3. 149.157292: > > > > > > > > > > > gether_disconnect > > > > > > > > > > > <-eem_set_alt > > > > > > > > > > > irq/68-dwc3-734 [000] > > > > > > > > > > > D..3. 149.159338: gether_connect > > > > > > > > > > > <-eem_set_alt > > > > > > > > > > > irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap > > > > > > > > > > > <-rx_complete > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > I don't know where to look exactly. Any hints? > > > > > > > > > > > > > > > > > > > > do you see anything related to > > > > > > > > > > gether_cleanup() after eem_unbind() ? > > > > > > > > > > > > > > > > > > Nope. It's a pitty that the trace formatting got messed up > > > > > > > > > above. But as > > > > > > > > > you can see I traced gether_* and eem_*. After eem_unbind no traced > > > > > > > > > function is called, until I flip the switch to device mode. > > > > > > > > > The ... at the end is where I cut uninteresting eem_unwrap > > > > > > > > > <-rx_complete > > > > > > > > > and eem_wrap <-eth_start_xmit lines. > > > > > > > > > > > > > > > > > > > If not then, you may try to enable tracing of TCP/IP stack and > > > > > > > > > > network side to check who deleting the sysfs entry > > > > > > > > > > > > > > > > > > Yes, that's a vast amount of functions to trace. And I don't see how > > > > > > > > > that would be related to the patch we're > > > > > > > > > discussing here. I was hoping > > > > > > > > > for a little more targeted hint. > > > > > > > > > > > > > > > > Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', > > > > > > > > 'usb_fun*', > > > > > > > > 'reset_config' and 'device_remove_file' leads me to: > > > > > > > > > > > > > > > > TIMESTAMP FUNCTION > > > > > > > > | | > > > > > > > > 49.952477: eem_wrap <-eth_start_xmit > > > > > > > > 55.072455: eem_wrap <-eth_start_xmit > > > > > > > > 55.072621: eem_unwrap <-rx_complete > > > > > > > > 59.011540: configfs_composite_reset <-usb_gadget_udc_reset > > > > > > > > 59.011545: composite_reset <-configfs_composite_reset > > > > > > > > 59.011548: reset_config <-__composite_disconnect > > > > > > > > 59.013565: eem_disable <-reset_config > > > > > > > > 59.013567: gether_disconnect <-reset_config > > > > > > > > 59.049560: device_remove_file <-device_remove > > > > > > > > 59.051185: configfs_composite_disconnect <-usb_gadget_disco > > > > > > > > 59.051189: composite_disconnect <-configfs_composite_discon > > > > > > > > 59.051195: configfs_composite_unbind <-gadget_unbind_driver > > > > > > > > 59.052519: eem_unbind <-purge_configs_funcs > > > > > > > > 59.052529: composite_dev_cleanup <-configfs_composite_unbin > > > > > > > > 59.052537: device_remove_file <-composite_dev_cleanup > > > > > > > > > > > > > > > > device_remove_file gets called twice, once by device_remove after > > > > > > > > gether_disconnect (that the one). The 2nd time by > > > > > > > > composite_dev_cleanup > > > > > > > > (removing the gadget) > > > > > > > > > > > > > > I believe that the device_remove_file function is only removing > > > > > > > suspend-specific attributes, not the complete gadget. > > > > > > > Typically, when you perform the role switch, the Gadget start/stop > > > > > > > function in your UDC driver is called. These functions should not > > > > > > > delete the gadget > > > > > > > > > > > > > > To investigate further, could you please enable the DWC3 functions > > > > > > > in ftrace and check who is removing the gadget? > > > > > > > I can also enable this on my system and compare the logs with yours, > > > > > > > but I will be in PI planning for 1.5 weeks and may not be able to > > > > > > > provide immediate support. > > > > > > > > > > > > Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so > > > > > > trace is 600kb. I cut uninteresting stuff before > > > > > > configfs_composite_reset <-usb_gadget_udc_reset and after > > > > > > __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> > > > > > > compare. > > > > > > > > > > Could you please try with the following patch and see if your issue > > > > resolves ? > > > > > > > > diff --git a/drivers/usb/gadget/function/f_eem.c > > > > b/drivers/usb/gadget/function/f_eem.c > > > > index 3b445bd88498..c2a904475083 100644 > > > > --- a/drivers/usb/gadget/function/f_eem.c > > > > +++ b/drivers/usb/gadget/function/f_eem.c > > > > @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, > > > > struct usb_function *f) > > > > struct usb_composite_dev *cdev = c->cdev; > > > > struct f_eem *eem = func_to_eem(f); > > > > struct usb_string *us; > > > > - int status; > > > > + int status = 0; > > > > struct usb_ep *ep; > > > > struct f_eem_opts *eem_opts; > > > > @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration > > > > *c, struct usb_function *f) > > > > * with list_for_each_entry, so we assume no race condition > > > > * with regard to eem_opts->bound access > > > > */ > > > > + mutex_lock(&eem_opts->lock); > > > > + gether_set_gadget(eem_opts->net, cdev->gadget); > > > > + > > > > if (!eem_opts->bound) { > > > > - mutex_lock(&eem_opts->lock); > > > > - gether_set_gadget(eem_opts->net, cdev->gadget); > > > > status = gether_register_netdev(eem_opts->net); > > > > - mutex_unlock(&eem_opts->lock); > > > > - if (status) > > > > - return status; > > > > - eem_opts->bound = true; > > > > } > > > > + mutex_unlock(&eem_opts->lock); > > > > + > > > > + if (status) > > > > + goto fail; > > > > + > > > > + eem_opts->bound = true; > > > > + > > > > us = usb_gstrings_attach(cdev, eem_strings, > > > > ARRAY_SIZE(eem_string_defs)); > > > > > > > > > > After switching from host -> device -> host the usb0 device as seen by > > > ifconfig remains "RUNNING" and in the routing table. > > > > > > FYI This is the regression. > > > > For us, I don't see any advantages in having this patch and only the > > downside. Do you have any further ideas / patches that we could test. > > > > Otherwise, it might be best to go with the original suggestion and revert > > until it gets sorted out? > > Hello, > > Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ? Ferry or I can do it, but I will be able to do that next week. > However, We are very much intrested to fix your problem and upstream the v2. > Therefore could you please share the following details to reproduce the issue > - How did you created the gadget ? Could you please share the exact commands to create the eem gadget ? Ferry will answer to this one. > - Arch of your test bench (arm, x86) x86, the SoC is Intel Tangier on the Intel Merrifield (platform). > - How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ? Yes, it's PCI connected device (which you may have already deducted from the thread a lot of times). It's part of the SoC (also can be easily deducted). The IDs are available in the driver: https://elixir.bootlin.com/linux/latest/source/drivers/usb/dwc3/dwc3-pci.c#L24 > > > Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is > > > still missing. > > > > > > After deeper diving into v6.1.0, I found the latter a longer existing > > > bug, not caused by your patch. > > > > > > More over, this doesn't seem to do any harm. > > > > > > The first issue does, as we try to route traffic over a device that no > > > longer exists. > > > > > > > > > > Additionally, please check if you have any customized DWC patches > > > > > > > that may be causing this problem. > > > > > > > > > > > > > > > > You may recall the whole issue did not occur before this > > > > > > > > > patch got applied. > > > > > > > > > > > > > According to current kernel architecture of u_ether driver, only > > > > > > > > > > > > > gether_cleanup should remove the usb0 interface along with its > > > > > > > > > > > > > kobject and sysfs interface. > > > > > > > > > > > > > I suggest sharing the analysis > > > > > > > > > > > > > here to understand why this > > > > > > > > > > > > > practice > > > > > > > > > > > > > is not followed in your use case or driver ? > > > > > > > > > > > > > > > > > > > > > > > > Yes, I'll try to trace where that happens. > > > > > > > > > > > > > > > > > > > > > > > > Nevertheless, the disappearance of the net/usb0 directory seems > > > > > > > > > > > > harmless? But the usb: net device > > > > > > > > > > > > remaining after disconnect or role > > > > > > > > > > > > switch is not good, as the route remains. > > > > > > > > > > > > > > > > > > > > > > > > May be they are 2 separate problems. > > > > > > > > > > > > Could you try to reproduce what > > > > > > > > > > > > happens if you make eem connection and then unplug? > > > > > > > > > > > > > > > > > > > > > > > > > I am curious why the driver was > > > > > > > > > > > > > developed without adhering to > > > > > > > > > > > > > the > > > > > > > > > > > > > kernel's gadget architecture. > > > > > > > > > > > > > > > > > > > > > > I don't know what you mean here. Which driver do you mean? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have the dwc3 IP base usb controller, Let me check > > > > > > > > > > > > > > > > > with this patch and > > > > > > > > > > > > > > > > > share result here. May be we need some fix in dwc3 > > > > > > > > > > > > > > > Would have been nice if someone could test on other > > > > > > > > > > > > > > > controller as well. But > > > > > > > > > > > > > > > another instance of dwc3 is also very welcome. > > > > > > > > > > > > > > > > It's quite possible, please test on your side. > > > > > > > > > > > > > > > > We are happy to test any fixes if you come up with.
Hi, Op 28-05-2024 om 09:18 schreef Hardik Gajjar: > On Sun, May 26, 2024 at 10:52:53PM +0200, Ferry Toth wrote: >> Hi >> >> Op 15-05-2024 om 20:38 schreef Ferry Toth: >>> Hi, >>> >>> Sorry to have kept you waiting, I was away for a short break. >>> >>> Patch tested below. >>> >>> Op 10-05-2024 om 11:45 schreef Hardik Gajjar: >>>> On Thu, May 02, 2024 at 10:32:16PM +0200, Ferry Toth wrote: >>>>> Oops, sorry, wrong file attached . Now correct one. >>>>> >>>>> Op 02-05-2024 om 22:13 schreef Ferry Toth: >>>>>> Op 02-05-2024 om 17:29 schreef Hardik Gajjar: >>>>>>> On Tue, Apr 30, 2024 at 11:12:17PM +0200, Ferry Toth wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Op 30-04-2024 om 21:40 schreef Ferry Toth: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Op 30-04-2024 om 17:32 schreef Hardik Gajjar: >>>>>>>>>> On Sun, Apr 28, 2024 at 11:07:36PM +0200, Ferry Toth wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Op 25-04-2024 om 23:27 schreef Ferry Toth: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> Op 17-04-2024 om 17:13 schreef Hardik Gajjar: >>>>>>>>>>>>> On Tue, Apr 16, 2024 at 04:48:32PM +0300, Andy Shevchenko wrote: >>>>>>>>>>>>>> On Thu, Apr 11, 2024 at 10:52:36PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>>> Op 11-04-2024 om 18:39 schreef Andy Shevchenko: >>>>>>>>>>>>>>>> On Thu, Apr 11, 2024 >>>>>>>>>>>>>>>> at 04:26:37PM +0200, >>>>>>>>>>>>>>>> Hardik Gajjar wrote: >>>>>>>>>>>>>>>>> On Wed, Apr 10, 2024 at >>>>>>>>>>>>>>>>> 08:37:42PM +0300, Andy >>>>>>>>>>>>>>>>> Shevchenko wrote: >>>>>>>>>>>>>>>>>> On Sun, Apr 07, 2024 at 10:51:51PM +0200, Ferry Toth wrote: >>>>>>>>>>>>>>>>>>> Op 05-04-2024 om 13:38 schreef Hardik Gajjar: >>>>>>>>>>>>>> >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Exactly. And this didn't happen before the 2 patches. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> To be precise: /sys/class/net/usb0 is not >>>>>>>>>>>>>>>>>>> removed and it is a link, the link >>>>>>>>>>>>>>>>>>> target /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net/usb0 >>>>>>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>> longer exists >>>>>>>>>>>>>>>>> So, it means that the /sys/class/net/usb0 is >>>>>>>>>>>>>>>>> present, but the symlink is >>>>>>>>>>>>>>>>> broken. In that case, the dwc3 driver should >>>>>>>>>>>>>>>>> recreate the device, and the >>>>>>>>>>>>>>>>> symlink should become active again >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, on first enabling gadget (when device mode is activated): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>>>> driver net power sound subsystem suspended uevent >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Then switching to host mode: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>>>> ls: cannot access >>>>>>>>>>>>>>> '/sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/': >>>>>>>>>>>>>>> No such file >>>>>>>>>>>>>>> or directory >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Then back to device mode: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@yuna:~# ls >>>>>>>>>>>>>>> /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/ >>>>>>>>>>>>>>> driver power sound subsystem suspended uevent >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> net is missing. But, network functions: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> root@yuna:~# ping 10.42.0.1 >>>>>>>>>>>>>>> PING 10.42.0.1 (10.42.0.1): 56 data bytes >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mass storage device is >>>>>>>>>>>>>>> created and removed each >>>>>>>>>>>>>>> time as expected. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, what's the conclusion? >>>>>>>>>>>>>> Shall we move towards revert >>>>>>>>>>>>>> of those >>>>>>>>>>>>>> two changes? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> As promised, I have the tested >>>>>>>>>>>>> the this patch with the dwc3 >>>>>>>>>>>>> gadget. >>>>>>>>>>>>> I could not reproduce >>>>>>>>>>>>> the issue. >>>>>>>>>>>>> >>>>>>>>>>>>> I can see the usb0 exist all the >>>>>>>>>>>>> time and accessible regardless >>>>>>>>>>>>> of >>>>>>>>>>>>> the role switching of the USB mode (peripheral <-> host) >>>>>>>>>>>>> >>>>>>>>>>>>> Following are the logs: >>>>>>>>>>>>> //Host to device >>>>>>>>>>>>> >>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb >>>>>>>>>>>>> # echo "peripheral" >>>>>>>>>>>>>> mode >>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>>>> usb0 >>>>>>>>>>>>> >>>>>>>>>>>>> //device to host >>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb >>>>>>>>>>>>> # echo "host" > mode >>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # ls >>>>>>>>>>>>> a800000.dwc3/gadget/net/ >>>>>>>>>>>>> usb0 >>>>>>>>>>>> >>>>>>>>>>>> That is weird. When I switch to host mode (using >>>>>>>>>>>> the physical switch), >>>>>>>>>>>> the whole gadget directory is removed (now testing 6.9.0-rc5) >>>>>>>>>>>> >>>>>>>>>>>> Switching back to device mode, that gadget >>>>>>>>>>>> directory is recreated. And >>>>>>>>>>>> gadget/sound as well, but not gadget/net. >>>>>>>>>>>> >>>>>>>>>>>>> s a800000.dwc3/gadget/net/usb0 >>>>>>>>>>>>> < >>>>>>>>>>>>> addr_assign_type duplex phys_port_name >>>>>>>>>>>>> addr_len flags phys_switch_id >>>>>>>>>>>>> address gro_flush_timeout power >>>>>>>>>>>>> broadcast ifalias proto_down >>>>>>>>>>>>> carrier ifindex queues >>>>>>>>>>>>> carrier_changes iflink speed >>>>>>>>>>>>> carrier_down_count link_mode statistics >>>>>>>>>>>>> carrier_up_count mtu subsystem >>>>>>>>>>>>> dev_id name_assign_type tx_queue_len >>>>>>>>>>>>> dev_port netdev_group type >>>>>>>>>>>>> device operstate uevent >>>>>>>>>>>>> dormant phys_port_id waiting_for_supplier >>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb >>>>>>>>>>>>> # ifconfig -a usb0 >>>>>>>>>>>>> usb0 Link encap:Ethernet HWaddr 3a:8b:63:97:1a:9a >>>>>>>>>>>>> BROADCAST MULTICAST MTU:1500 Metric:1 >>>>>>>>>>>>> RX packets:0 errors:0 dropped:0 overruns:0 frame:0 >>>>>>>>>>>>> TX packets:0 >>>>>>>>>>>>> errors:0 dropped:0 overruns:0 >>>>>>>>>>>>> carrier:0 >>>>>>>>>>>>> collisions:0 txqueuelen:1000 >>>>>>>>>>>>> RX bytes:0 TX bytes:0 >>>>>>>>>>>>> >>>>>>>>>>>>> console:/sys/bus/platform/devices/a800000.ssusb # >>>>>>>>>>>>> >>>>>>>>>>>>> I strongly advise against >>>>>>>>>>>>> reverting the patch solely based >>>>>>>>>>>>> on the >>>>>>>>>>>>> observed issue of removing the >>>>>>>>>>>>> /sys/class/net/usb0 directory >>>>>>>>>>>>> while >>>>>>>>>>>>> the usb0 interface remains available. >>>>>>>>>>>> >>>>>>>>>>>> There's more to it. I also mentioned that switching the role or >>>>>>>>>>>> unplugging the cable leaves the usb0 connection. >>>>>>>>>>>> >>>>>>>>>>>> I have while in host mode: >>>>>>>>>>>> root@yuna:~# ifconfig -a usb0 >>>>>>>>>>>> usb0: >>>>>>>>>>>> flags=-28605<UP,BROADCAST,RUNNING,MULTICAST,DYNAMIC> >>>>>>>>>>>> mtu 1500 >>>>>>>>>>>> inet 10.42.0.221 netmask 255.255.255.0 broadcast >>>>>>>>>>>> 10.42.0.255 >>>>>>>>>>>> inet6 fe80::a8bb:ccff:fedd:eef1 prefixlen 64 >>>>>>>>>>>> scopeid 0x20<link> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You don't see that because you didn't create a connection at all. >>>>>>>>>>>> >>>>>>>>>>>>> Instead, I recommend enabling FTRACE to >>>>>>>>>>>>> trace the functions involved >>>>>>>>>>>>> and identify which faulty call is responsible for removing usb0. >>>>>>>>>>>> >>>>>>>>>>>> Switching from device -> host -> device: >>>>>>>>>>>> >>>>>>>>>>>> root@yuna:~# trace-cmd record -p function_graph -l *gether_* >>>>>>>>>>>> plugin 'function_graph' >>>>>>>>>>>> Hit Ctrl^C to stop recording >>>>>>>>>>>> ^CCPU0 data recorded at offset=0x1c8000 >>>>>>>>>>>> 188 bytes in size (4096 uncompressed) >>>>>>>>>>>> CPU1 data recorded at offset=0x1c9000 >>>>>>>>>>>> 0 bytes in size (0 uncompressed) >>>>>>>>>>>> root@yuna:~# trace-cmd report >>>>>>>>>>>> cpus=2 >>>>>>>>>>>> irq/68-dwc3-725 [000] 514.575337: >>>>>>>>>>>> funcgraph_entry: # >>>>>>>>>>>> 2079.480 us | gether_disconnect(); >>>>>>>>>>>> irq/68-dwc3-946 [000] 524.263731: >>>>>>>>>>>> funcgraph_entry: + >>>>>>>>>>>> 11.640 us | gether_disconnect(); >>>>>>>>>>>> irq/68-dwc3-946 [000] 524.263743: >>>>>>>>>>>> funcgraph_entry: ! >>>>>>>>>>>> 116.520 us | gether_connect(); >>>>>>>>>>>> irq/68-dwc3-946 [000] 524.268029: >>>>>>>>>>>> funcgraph_entry: # >>>>>>>>>>>> 2057.260 us | gether_disconnect(); >>>>>>>>>>>> irq/68-dwc3-946 [000] 524.270089: >>>>>>>>>>>> funcgraph_entry: ! >>>>>>>>>>>> 109.000 us | gether_connect(); >>>>>>>>>>> >>>>>>>>>>> I tried to get a more useful trace: >>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'gether_*' > set_ftrace_filter >>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'eem_*' >> set_ftrace_filter >>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo function > current_tracer >>>>>>>>>>> root@yuna:/sys/kernel/tracing# echo 'reset_config' >>>>>>>>>>>>> set_ftrace_filter >>>>>>>>>>> -> switch to host mode then back to device >>>>>>>>>>> root@yuna:/sys/kernel/tracing# cat trace >>>>>>>>>>> # tracer: function >>>>>>>>>>> # >>>>>>>>>>> # entries-in-buffer/entries-written: 53/53 #P:2 >>>>>>>>>>> # >>>>>>>>>>> # _-----=> irqs-off/BH-disabled >>>>>>>>>>> # / _----=> need-resched >>>>>>>>>>> # | / _---=> hardirq/softirq >>>>>>>>>>> # || / _--=> preempt-depth >>>>>>>>>>> # ||| / _-=> migrate-disable >>>>>>>>>>> # |||| / delay >>>>>>>>>>> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION >>>>>>>>>>> # | | | ||||| | | >>>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.990254: reset_config >>>>>>>>>>> <-__composite_disconnect >>>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992274: eem_disable >>>>>>>>>>> <-reset_config >>>>>>>>>>> irq/68-dwc3-523 [000] D..3. 133.992276: >>>>>>>>>>> gether_disconnect >>>>>>>>>>> <-reset_config >>>>>>>>>>> kworker/1:3-443 [001] ...1. 134.022453: eem_unbind >>>>>>>>>>> <-purge_configs_funcs >>>>>>>>>>> >>>>>>>>>>> -> to device mode >>>>>>>>>>> >>>>>>>>>>> kworker/1:3-443 [001] ...1. 148.630773: eem_bind >>>>>>>>>>> <-usb_add_function >>>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155209: eem_set_alt >>>>>>>>>>> <-composite_setup >>>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.155215: >>>>>>>>>>> gether_disconnect >>>>>>>>>>> <-eem_set_alt >>>>>>>>>>> irq/68-dwc3-734 [000] >>>>>>>>>>> D..3. 149.155220: gether_connect >>>>>>>>>>> <-eem_set_alt >>>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157287: eem_set_alt >>>>>>>>>>> <-composite_setup >>>>>>>>>>> irq/68-dwc3-734 [000] D..3. 149.157292: >>>>>>>>>>> gether_disconnect >>>>>>>>>>> <-eem_set_alt >>>>>>>>>>> irq/68-dwc3-734 [000] >>>>>>>>>>> D..3. 149.159338: gether_connect >>>>>>>>>>> <-eem_set_alt >>>>>>>>>>> irq/68-dwc3-734 [000] D..2. 149.239625: eem_unwrap >>>>>>>>>>> <-rx_complete >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> I don't know where to look exactly. Any hints? >>>>>>>>>> >>>>>>>>>> do you see anything related to >>>>>>>>>> gether_cleanup() after eem_unbind() ? >>>>>>>>> >>>>>>>>> Nope. It's a pitty that the trace formatting got messed up >>>>>>>>> above. But as >>>>>>>>> you can see I traced gether_* and eem_*. After eem_unbind no traced >>>>>>>>> function is called, until I flip the switch to device mode. >>>>>>>>> The ... at the end is where I cut uninteresting eem_unwrap >>>>>>>>> <-rx_complete >>>>>>>>> and eem_wrap <-eth_start_xmit lines. >>>>>>>>> >>>>>>>>>> If not then, you may try to enable tracing of TCP/IP stack and >>>>>>>>>> network side to check who deleting the sysfs entry >>>>>>>>> >>>>>>>>> Yes, that's a vast amount of functions to trace. And I don't see how >>>>>>>>> that would be related to the patch we're >>>>>>>>> discussing here. I was hoping >>>>>>>>> for a little more targeted hint. >>>>>>>> >>>>>>>> Now filtering 'gether_*', 'eem_*', '*configfs_*', 'composite_*', >>>>>>>> 'usb_fun*', >>>>>>>> 'reset_config' and 'device_remove_file' leads me to: >>>>>>>> >>>>>>>> TIMESTAMP FUNCTION >>>>>>>> | | >>>>>>>> 49.952477: eem_wrap <-eth_start_xmit >>>>>>>> 55.072455: eem_wrap <-eth_start_xmit >>>>>>>> 55.072621: eem_unwrap <-rx_complete >>>>>>>> 59.011540: configfs_composite_reset <-usb_gadget_udc_reset >>>>>>>> 59.011545: composite_reset <-configfs_composite_reset >>>>>>>> 59.011548: reset_config <-__composite_disconnect >>>>>>>> 59.013565: eem_disable <-reset_config >>>>>>>> 59.013567: gether_disconnect <-reset_config >>>>>>>> 59.049560: device_remove_file <-device_remove >>>>>>>> 59.051185: configfs_composite_disconnect <-usb_gadget_disco >>>>>>>> 59.051189: composite_disconnect <-configfs_composite_discon >>>>>>>> 59.051195: configfs_composite_unbind <-gadget_unbind_driver >>>>>>>> 59.052519: eem_unbind <-purge_configs_funcs >>>>>>>> 59.052529: composite_dev_cleanup <-configfs_composite_unbin >>>>>>>> 59.052537: device_remove_file <-composite_dev_cleanup >>>>>>>> >>>>>>>> device_remove_file gets called twice, once by device_remove after >>>>>>>> gether_disconnect (that the one). The 2nd time by >>>>>>>> composite_dev_cleanup >>>>>>>> (removing the gadget) >>>>>>> >>>>>>> I believe that the device_remove_file function is only removing >>>>>>> suspend-specific attributes, not the complete gadget. >>>>>>> Typically, when you perform the role switch, the Gadget start/stop >>>>>>> function in your UDC driver is called. These functions should not >>>>>>> delete the gadget >>>>>>> >>>>>>> To investigate further, could you please enable the DWC3 functions >>>>>>> in ftrace and check who is removing the gadget? >>>>>>> I can also enable this on my system and compare the logs with yours, >>>>>>> but I will be in PI planning for 1.5 weeks and may not be able to >>>>>>> provide immediate support. >>>>>> >>>>>> Yes, but of course adding dwc3_* (and usb_*) also traces host mode, so >>>>>> trace is 600kb. I cut uninteresting stuff before >>>>>> configfs_composite_reset <-usb_gadget_udc_reset and after >>>>>> __dwc3_set_mode, <https://urldefense.proofpoint.com/v2/url?u=http-3A__300linesremain.Seeattachedtar.gzforyouto&d=DwIDaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=SAhjP5GOmrADp1v_EE5jWoSuMlYCIt9gKduw-DCBPLs&m=zdiBhk-2V5AXxu707QAhbCgWR4qNVRARBmxN17nVB69gVOm-QPqrJeKpo4_trszw&s=ixagWKgLs6wQDJfwh4vIDQNDiy8GZnK9KnUELIfiJz0&e=> >>>>>> compare. >>>>>> >>>> Could you please try with the following patch and see if your issue >>>> resolves ? >>>> >>>> diff --git a/drivers/usb/gadget/function/f_eem.c >>>> b/drivers/usb/gadget/function/f_eem.c >>>> index 3b445bd88498..c2a904475083 100644 >>>> --- a/drivers/usb/gadget/function/f_eem.c >>>> +++ b/drivers/usb/gadget/function/f_eem.c >>>> @@ -247,7 +247,7 @@ static int eem_bind(struct usb_configuration *c, >>>> struct usb_function *f) >>>> struct usb_composite_dev *cdev = c->cdev; >>>> struct f_eem *eem = func_to_eem(f); >>>> struct usb_string *us; >>>> - int status; >>>> + int status = 0; >>>> struct usb_ep *ep; >>>> struct f_eem_opts *eem_opts; >>>> @@ -260,16 +260,20 @@ static int eem_bind(struct usb_configuration >>>> *c, struct usb_function *f) >>>> * with list_for_each_entry, so we assume no race condition >>>> * with regard to eem_opts->bound access >>>> */ >>>> + mutex_lock(&eem_opts->lock); >>>> + gether_set_gadget(eem_opts->net, cdev->gadget); >>>> + >>>> if (!eem_opts->bound) { >>>> - mutex_lock(&eem_opts->lock); >>>> - gether_set_gadget(eem_opts->net, cdev->gadget); >>>> status = gether_register_netdev(eem_opts->net); >>>> - mutex_unlock(&eem_opts->lock); >>>> - if (status) >>>> - return status; >>>> - eem_opts->bound = true; >>>> } >>>> + mutex_unlock(&eem_opts->lock); >>>> + >>>> + if (status) >>>> + goto fail; >>>> + >>>> + eem_opts->bound = true; >>>> + >>>> us = usb_gstrings_attach(cdev, eem_strings, >>>> ARRAY_SIZE(eem_string_defs)); >>>> >>> >>> After switching from host -> device -> host the usb0 device as seen by >>> ifconfig remains "RUNNING" and in the routing table. >>> >>> FYI This is the regression. >> >> For us, I don't see any advantages in having this patch and only the >> downside. Do you have any further ideas / patches that we could test. >> >> Otherwise, it might be best to go with the original suggestion and revert >> until it gets sorted out? > > Hello, > > Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ? > > However, We are very much intrested to fix your problem and upstream the v2. > Therefore could you please share the following details to reproduce the issue Instruction to establish the eem connection on the device (connman) and the host (networkmanager) here: https://edison-fw.github.io/meta-intel-edison/4.2-networking.html#gadget > - How did you created the gadget ? Could you please share the exact commands to create the eem gadget ? Sure, you can find the u-dev rules and the script here on GH: https://github.com/edison-fw/meta-intel-edison/tree/kirkstone/meta-intel-edison-bsp/recipes-support/gadget/files > - Arch of your test bench (arm, x86) This is x86_64 although we can just as easily build i686 if needed. > - How the DWC3 controller connected to your system ? via PCIe slot ? Could you please share the part number of the PCIe card ? Like Andy mentioned it is in the SOC. It's described in ACPI (in U-Boot together with other SOC level devices) here: https://github.com/u-boot/u-boot/blob/ea722aa5eb33740ae77e8816aeb72b385e621cd0/arch/x86/include/asm/arch-tangier/acpi/southcluster.asl#L333 > Hardik > >> >>> Also /sys/devices/pci0000:00/0000:00:11.0/dwc3.0.auto/gadget.0/net is >>> still missing. >>> >>> After deeper diving into v6.1.0, I found the latter a longer existing >>> bug, not caused by your patch. >>> >>> More over, this doesn't seem to do any harm. >>> >>> The first issue does, as we try to route traffic over a device that no >>> longer exists. >>> >>>>>>> Additionally, please check if you have any customized DWC patches >>>>>>> that may be causing this problem. >>>>>>> >>>>>>>> >>>>>>>>> You may recall the whole issue did not occur before this >>>>>>>>> patch got applied. >>>>>>>>> >>>>>>>>>> Hardik >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> According to current kernel architecture of u_ether driver, only >>>>>>>>>>>>> gether_cleanup should remove the usb0 interface along with its >>>>>>>>>>>>> kobject and sysfs interface. >>>>>>>>>>>>> I suggest sharing the analysis >>>>>>>>>>>>> here to understand why this >>>>>>>>>>>>> practice >>>>>>>>>>>>> is not followed in your use case or driver ? >>>>>>>>>>>> >>>>>>>>>>>> Yes, I'll try to trace where that happens. >>>>>>>>>>>> >>>>>>>>>>>> Nevertheless, the disappearance of the net/usb0 directory seems >>>>>>>>>>>> harmless? But the usb: net device >>>>>>>>>>>> remaining after disconnect or role >>>>>>>>>>>> switch is not good, as the route remains. >>>>>>>>>>>> >>>>>>>>>>>> May be they are 2 separate problems. >>>>>>>>>>>> Could you try to reproduce what >>>>>>>>>>>> happens if you make eem connection and then unplug? >>>>>>>>>>>> >>>>>>>>>>>>> I am curious why the driver was >>>>>>>>>>>>> developed without adhering to >>>>>>>>>>>>> the >>>>>>>>>>>>> kernel's gadget architecture. >>>>>>>>>>> >>>>>>>>>>> I don't know what you mean here. Which driver do you mean? >>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have the dwc3 IP base usb controller, Let me check >>>>>>>>>>>>>>>>> with this patch and >>>>>>>>>>>>>>>>> share result here. May be we need some fix in dwc3 >>>>>>>>>>>>>>> Would have been nice if someone could test on other >>>>>>>>>>>>>>> controller as well. But >>>>>>>>>>>>>>> another instance of dwc3 is also very welcome. >>>>>>>>>>>>>>>> It's quite possible, please test on your side. >>>>>>>>>>>>>>>> We are happy to test any fixes if you come up with. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> With Best Regards, >>>>>>>>>>>>>> Andy Shevchenko >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>> >>>> >>>
On 28.05.24 09:18, Hardik Gajjar wrote: > > Okay, lets revert the patch till we fix your problem. Would you be able to send the revert ? or shall I ? From what I can see it seems nobody submitted the revert since then. Unless I'm mistaken could you thus please take care of that? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c index 4bb0553da658..9d1c40c152d8 100644 --- a/drivers/usb/gadget/function/u_ether.c +++ b/drivers/usb/gadget/function/u_ether.c @@ -1200,7 +1200,7 @@ void gether_disconnect(struct gether *link) DBG(dev, "%s\n", __func__); - netif_stop_queue(dev->net); + netif_device_detach(dev->net); netif_carrier_off(dev->net); /* disable endpoints, forcing (synchronous) completion
This patch replaces the usage of netif_stop_queue with netif_device_detach in the u_ether driver. The netif_device_detach function not only stops all tx queues by calling netif_tx_stop_all_queues but also marks the device as removed by clearing the __LINK_STATE_PRESENT bit. This change helps notify user space about the disconnection of the device more effectively, compared to netif_stop_queue, which only stops a single transmit queue. Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com> --- Changes since version 1: - Correct Singed-off user name and e-mail Changes since version 2: - Move change history below signed-off-by Changes since version 3: - move netif_device_detach from eth_stop to gether_disconnect. --- drivers/usb/gadget/function/u_ether.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)