Message ID | 20240415113436.3261042-1-vschneid@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | tcp/dcpp: Un-pin tw_timer | expand |
On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: > > Hi, > > This is v5 of the series where the tw_timer is un-pinned to get rid of > interferences in isolated CPUs setups. > > The first patch is a new one stemming from Jakub's bug reported. It's there > mainly to make the reviewing a bit easier, but as it changes behaviour it should > be squashed with the second one. > > Revisions > ========= > > v4 -> v5 > ++++++++ > > o Rebased against latest Linus' tree > o Converted tw_timer into a delayed work following Jakub's bug report on v4 > http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org What was the issue again ? Please explain precisely why it was fundamentally tied to the use of timers (and this was not possible to fix the issue without adding work queues and more dependencies to TCP stack)
On 15/04/24 14:35, Eric Dumazet wrote: > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: >> >> Hi, >> >> This is v5 of the series where the tw_timer is un-pinned to get rid of >> interferences in isolated CPUs setups. >> >> The first patch is a new one stemming from Jakub's bug reported. It's there >> mainly to make the reviewing a bit easier, but as it changes behaviour it should >> be squashed with the second one. >> >> Revisions >> ========= >> >> v4 -> v5 >> ++++++++ >> >> o Rebased against latest Linus' tree >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 >> http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org > > What was the issue again ? > > Please explain precisely why it was fundamentally tied to the use of > timers (and this was not possible to fix the issue without > adding work queues and more dependencies to TCP stack) In v4 I added the use of the ehash lock to serialize arming the timewait timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). Unfortunately, holding a lock both in a timer callback and in the context in which it is destroyed is invalid. AIUI the issue is as follows: CPUx CPUy spin_lock(foo); <timer fires> call_timer_fn() spin_lock(foo) // blocks timer_shutdown_sync() __timer_delete_sync() __try_to_del_timer_sync() // looped as long as timer is running <deadlock> In our case, we had in v4: inet_twsk_deschedule_put() spin_lock(ehash_lock); tw_timer_handler() inet_twsk_kill() spin_lock(ehash_lock); __inet_twsk_kill(); timer_shutdown_sync(&tw->tw_timer); The fix here is to move the timer deletion to a non-timer context. Workqueues fit the bill, and as the tw_timer_handler() would just queue a work item, I converted it to a delayed_work.
On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider <vschneid@redhat.com> wrote: > > On 15/04/24 14:35, Eric Dumazet wrote: > > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: > >> > >> Hi, > >> > >> This is v5 of the series where the tw_timer is un-pinned to get rid of > >> interferences in isolated CPUs setups. > >> > >> The first patch is a new one stemming from Jakub's bug reported. It's there > >> mainly to make the reviewing a bit easier, but as it changes behaviour it should > >> be squashed with the second one. > >> > >> Revisions > >> ========= > >> > >> v4 -> v5 > >> ++++++++ > >> > >> o Rebased against latest Linus' tree > >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 > >> http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org > > > > What was the issue again ? > > > > Please explain precisely why it was fundamentally tied to the use of > > timers (and this was not possible to fix the issue without > > adding work queues and more dependencies to TCP stack) > > In v4 I added the use of the ehash lock to serialize arming the timewait > timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). > > Unfortunately, holding a lock both in a timer callback and in the context > in which it is destroyed is invalid. AIUI the issue is as follows: > > CPUx CPUy > spin_lock(foo); > <timer fires> > call_timer_fn() > spin_lock(foo) // blocks > timer_shutdown_sync() > __timer_delete_sync() > __try_to_del_timer_sync() // looped as long as timer is running > <deadlock> > > In our case, we had in v4: > > inet_twsk_deschedule_put() > spin_lock(ehash_lock); > tw_timer_handler() > inet_twsk_kill() > spin_lock(ehash_lock); > __inet_twsk_kill(); > timer_shutdown_sync(&tw->tw_timer); > > The fix here is to move the timer deletion to a non-timer > context. Workqueues fit the bill, and as the tw_timer_handler() would just queue > a work item, I converted it to a delayed_work. I do not like this delayed work approach. Adding more dependencies to the TCP stack is not very nice from a maintainer point of view. Why couldn't you call timer_shutdown_sync() before grabbing the lock ?
Apologies for the delayed reply, I was away for most of last week; On 16/04/24 17:01, Eric Dumazet wrote: > On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider <vschneid@redhat.com> wrote: >> >> On 15/04/24 14:35, Eric Dumazet wrote: >> > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: >> >> v4 -> v5 >> >> ++++++++ >> >> >> >> o Rebased against latest Linus' tree >> >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 >> >> http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org >> > >> > What was the issue again ? >> > >> > Please explain precisely why it was fundamentally tied to the use of >> > timers (and this was not possible to fix the issue without >> > adding work queues and more dependencies to TCP stack) >> >> In v4 I added the use of the ehash lock to serialize arming the timewait >> timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). >> >> Unfortunately, holding a lock both in a timer callback and in the context >> in which it is destroyed is invalid. AIUI the issue is as follows: >> >> CPUx CPUy >> spin_lock(foo); >> <timer fires> >> call_timer_fn() >> spin_lock(foo) // blocks >> timer_shutdown_sync() >> __timer_delete_sync() >> __try_to_del_timer_sync() // looped as long as timer is running >> <deadlock> >> >> In our case, we had in v4: >> >> inet_twsk_deschedule_put() >> spin_lock(ehash_lock); >> tw_timer_handler() >> inet_twsk_kill() >> spin_lock(ehash_lock); >> __inet_twsk_kill(); >> timer_shutdown_sync(&tw->tw_timer); >> >> The fix here is to move the timer deletion to a non-timer >> context. Workqueues fit the bill, and as the tw_timer_handler() would just queue >> a work item, I converted it to a delayed_work. > > I do not like this delayed work approach. > > Adding more dependencies to the TCP stack is not very nice from a > maintainer point of view. > > Why couldn't you call timer_shutdown_sync() before grabbing the lock ? We need the timer_shutdown_sync() and mod_timer() of tw->tw_timer to be serialized in some way. If they aren't, we have the following race: tcp_time_wait() inet_twsk_hashdance() inet_twsk_deschedule_put() // Returns 0 because not pending, but prevents future arming timer_shutdown_sync() inet_twsk_schedule() // Returns 0 as if timer had been succesfully armed mod_timer() This means inet_twsk_deschedule_put() doesn't end up calling inet_twsk_kill() (because the timer wasn't pending when it got shutdown), but inet_twsk_schedule() doesn't arm it either despite the hashdance() having updated the refcounts. If we leave the deschedule as a del_timer_sync(), the timer ends up armed in inet_twsk_schedule(), but that means waiting for the timer to fire to clean up the resources despite having called inet_twsk_deschedule_put().
Hi, On 22/04/24 16:31, Valentin Schneider wrote: > Apologies for the delayed reply, I was away for most of last week; > > On 16/04/24 17:01, Eric Dumazet wrote: >> On Mon, Apr 15, 2024 at 4:33 PM Valentin Schneider <vschneid@redhat.com> wrote: >>> >>> On 15/04/24 14:35, Eric Dumazet wrote: >>> > On Mon, Apr 15, 2024 at 1:34 PM Valentin Schneider <vschneid@redhat.com> wrote: >>> >> v4 -> v5 >>> >> ++++++++ >>> >> >>> >> o Rebased against latest Linus' tree >>> >> o Converted tw_timer into a delayed work following Jakub's bug report on v4 >>> >> http://lore.kernel.org/r/20240411100536.224fa1e7@kernel.org >>> > >>> > What was the issue again ? >>> > >>> > Please explain precisely why it was fundamentally tied to the use of >>> > timers (and this was not possible to fix the issue without >>> > adding work queues and more dependencies to TCP stack) >>> >>> In v4 I added the use of the ehash lock to serialize arming the timewait >>> timer vs destroying it (inet_twsk_schedule() vs inet_twsk_deschedule_put()). >>> >>> Unfortunately, holding a lock both in a timer callback and in the context >>> in which it is destroyed is invalid. AIUI the issue is as follows: >>> >>> CPUx CPUy >>> spin_lock(foo); >>> <timer fires> >>> call_timer_fn() >>> spin_lock(foo) // blocks >>> timer_shutdown_sync() >>> __timer_delete_sync() >>> __try_to_del_timer_sync() // looped as long as timer is running >>> <deadlock> >>> >>> In our case, we had in v4: >>> >>> inet_twsk_deschedule_put() >>> spin_lock(ehash_lock); >>> tw_timer_handler() >>> inet_twsk_kill() >>> spin_lock(ehash_lock); >>> __inet_twsk_kill(); >>> timer_shutdown_sync(&tw->tw_timer); >>> >>> The fix here is to move the timer deletion to a non-timer >>> context. Workqueues fit the bill, and as the tw_timer_handler() would just queue >>> a work item, I converted it to a delayed_work. Does this explanation make sense? This is the reasoning that drove me to involve workqueues. I'm open to suggestions on alternative approaches.