Message ID | 20230705181256.3539027-1-vschneid@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | context_tracking,x86: Defer some IPIs until a user->kernel transition | expand |
> On Jul 5, 2023, at 11:12 AM, Valentin Schneider <vschneid@redhat.com> wrote: > > Deferral approach > ================= > > Storing each and every callback, like a secondary call_single_queue turned out > to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in > userspace for as long as possible - no signal of any form would be sent when > deferring an IPI. This means that any form of queuing for deferred callbacks > would end up as a convoluted memory leak. > > Deferred IPIs must thus be coalesced, which this series achieves by assigning > IPIs a "type" and having a mapping of IPI type to callback, leveraged upon > kernel entry. I have some experience with similar an optimization. Overall, it can make sense and as you show, it can reduce the number of interrupts. The main problem of such an approach might be in cases where a process frequently enters and exits the kernel between deferred-IPIs, or even worse - the IPI is sent while the remote CPU is inside the kernel. In such cases, you pay the extra cost of synchronization and cache traffic, and might not even get the benefit of reducing the number of IPIs. In a sense, it's a more extreme case of the overhead that x86’s lazy-TLB mechanism introduces while tracking whether a process is running or not. But lazy-TLB would change is_lazy much less frequently than context tracking, which means that the deferring the IPIs as done in this patch-set has a greater potential to hurt performance than lazy-TLB. tl;dr - it would be beneficial to show some performance number for both a “good” case where a process spends most of the time in userspace, and “bad” one where a process enters and exits the kernel very frequently. Reducing the number of IPIs is good but I don’t think it is a goal by its own. [ BTW: I did not go over the patches in detail. Obviously, there are various delicate points that need to be checked, as avoiding the deferring of IPIs if page-tables are freed. ]
On Wed, 5 Jul 2023 19:12:42 +0100 Valentin Schneider <vschneid@redhat.com> wrote: > o Patches 1-5 have been submitted previously and are included for the sake of > testing I should have commented on the previous set, but I did my review on this set ;-) Anyway, I'm all for the patches. Care to send a new version covering my input? Thanks, -- Steve
On 05/07/23 18:48, Nadav Amit wrote: >> On Jul 5, 2023, at 11:12 AM, Valentin Schneider <vschneid@redhat.com> wrote: >> >> Deferral approach >> ================= >> >> Storing each and every callback, like a secondary call_single_queue turned out >> to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in >> userspace for as long as possible - no signal of any form would be sent when >> deferring an IPI. This means that any form of queuing for deferred callbacks >> would end up as a convoluted memory leak. >> >> Deferred IPIs must thus be coalesced, which this series achieves by assigning >> IPIs a "type" and having a mapping of IPI type to callback, leveraged upon >> kernel entry. > > I have some experience with similar an optimization. Overall, it can make > sense and as you show, it can reduce the number of interrupts. > > The main problem of such an approach might be in cases where a process > frequently enters and exits the kernel between deferred-IPIs, or even worse - > the IPI is sent while the remote CPU is inside the kernel. In such cases, you > pay the extra cost of synchronization and cache traffic, and might not even > get the benefit of reducing the number of IPIs. > > In a sense, it's a more extreme case of the overhead that x86’s lazy-TLB > mechanism introduces while tracking whether a process is running or not. But > lazy-TLB would change is_lazy much less frequently than context tracking, > which means that the deferring the IPIs as done in this patch-set has a > greater potential to hurt performance than lazy-TLB. > > tl;dr - it would be beneficial to show some performance number for both a > “good” case where a process spends most of the time in userspace, and “bad” > one where a process enters and exits the kernel very frequently. Reducing > the number of IPIs is good but I don’t think it is a goal by its own. > There already is a significant overhead incurred on kernel entry for nohz_full CPUs due to all of context_tracking faff; now I *am* making it worse with that extra atomic, but I get the feeling it's not going to stay :D nohz_full CPUs that do context transitions very frequently are unfortunately in the realm of "you shouldn't do that". Due to what's out there I have to care about *occasional* transitions, but some folks consider even that to be broken usage, so I don't believe getting numbers for that to be much relevant. > [ BTW: I did not go over the patches in detail. Obviously, there are > various delicate points that need to be checked, as avoiding the > deferring of IPIs if page-tables are freed. ]
On 05/07/23 15:03, Steven Rostedt wrote: > On Wed, 5 Jul 2023 19:12:42 +0100 > Valentin Schneider <vschneid@redhat.com> wrote: > >> o Patches 1-5 have been submitted previously and are included for the sake of >> testing > > I should have commented on the previous set, but I did my review on this set ;-) > Thanks for having a look! > Anyway, I'm all for the patches. Care to send a new version covering my input? > Sure thing, I'll send a v2 of these patches soonish. > Thanks, > > -- Steve