Message ID | 20180828172258.3185-1-hannes@cmpxchg.org (mailing list archive) |
---|---|
Headers | show |
Series | psi: pressure stall information for CPU, memory, and IO v4 | expand |
On Tue, Aug 28, 2018 at 01:22:49PM -0400, Johannes Weiner wrote: > This version 4 of the PSI series incorporates feedback from Peter and > fixes two races in the lockless aggregator that Suren found in his > testing and which caused the sample calculation to sometimes underflow > and record bogusly large samples; details at the bottom of this email. Peter, do the changes from v3 look sane to you? If there aren't any further objections, I was hoping we could get this lined up for 4.20.
On Thu, Sep 6, 2018 at 5:43 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > Peter, do the changes from v3 look sane to you? > > If there aren't any further objections, I was hoping we could get this > lined up for 4.20. That would be excellent. I just retested the latest version at http://git.cmpxchg.org/cgit.cgi/linux-psi.git (Linux 4.18) and the results are great. Test setup: Endless OS GeminiLake N4200 low end laptop 2GB RAM swap (and zram swap) disabled Baseline test: open a handful of large-ish apps and several website tabs in Google Chrome. Results: after a couple of minutes, system is excessively thrashing, mouse cursor can barely be moved, UI is not responding to mouse clicks, so it's impractical to recover from this situation as an ordinary user Add my simple killer: https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd Results: when the thrashing causes the UI to become sluggish, the killer steps in and kills something (usually a chrome tab), and the system remains usable. I repeatedly opened more apps and more websites over a 15 minute period but I wasn't able to get the system to a point of UI unresponsiveness. Thanks, Daniel
On Wed, Sep 05, 2018 at 05:43:03PM -0400, Johannes Weiner wrote: > On Tue, Aug 28, 2018 at 01:22:49PM -0400, Johannes Weiner wrote: > > This version 4 of the PSI series incorporates feedback from Peter and > > fixes two races in the lockless aggregator that Suren found in his > > testing and which caused the sample calculation to sometimes underflow > > and record bogusly large samples; details at the bottom of this email. > > Peter, do the changes from v3 look sane to you? I'll go have a look.
On Wed, Sep 05, 2018 at 05:43:03PM -0400, Johannes Weiner wrote: > On Tue, Aug 28, 2018 at 01:22:49PM -0400, Johannes Weiner wrote: > > This version 4 of the PSI series incorporates feedback from Peter and > > fixes two races in the lockless aggregator that Suren found in his > > testing and which caused the sample calculation to sometimes underflow > > and record bogusly large samples; details at the bottom of this email. > > Peter, do the changes from v3 look sane to you? > > If there aren't any further objections, I was hoping we could get this > lined up for 4.20. I suppose it looks ok, there's a few small nits, but nothing big. I still hate psi_ttwu_dequeue(), but I don't really know what to about that. So yeah, grudingly acked. Did you want me to pick this up through the scheduler tree since most of this lives there?
On Fri, Sep 07, 2018 at 01:04:07PM +0200, Peter Zijlstra wrote: > So yeah, grudingly acked. Did you want me to pick this up through the > scheduler tree since most of this lives there? Thanks for the ack. As for routing it, I'll leave that decision to you and Andrew. It touches stuff all over, so it could result in quite a few conflicts between trees (although I don't expect any of them to be non-trivial).
Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8 code system running Android. Signals behave as expected reacting to memory pressure, no jumps in "total" counters that would indicate an overflow/underflow issues. Nicely done! Tested-by: Suren Baghdasaryan <surenb@google.com> On Fri, Sep 7, 2018 at 8:09 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: > On Fri, Sep 07, 2018 at 01:04:07PM +0200, Peter Zijlstra wrote: >> So yeah, grudingly acked. Did you want me to pick this up through the >> scheduler tree since most of this lives there? > > Thanks for the ack. > > As for routing it, I'll leave that decision to you and Andrew. It > touches stuff all over, so it could result in quite a few conflicts > between trees (although I don't expect any of them to be non-trivial).
Hi Suren On Fri, Sep 7, 2018 at 11:58 PM, Suren Baghdasaryan <surenb@google.com> wrote: > Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8 > code system running Android. Signals behave as expected reacting to > memory pressure, no jumps in "total" counters that would indicate an > overflow/underflow issues. Nicely done! Can you share your Linux v4.9 psi backport somewhere? Thanks Daniel
Will it be part of the backport to 4.9 google android or is it for test only? I guess that this patch is to big for the LTS tree. On 09/07/2018 05:58 PM, Suren Baghdasaryan wrote: > Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8 > code system running Android. Signals behave as expected reacting to > memory pressure, no jumps in "total" counters that would indicate an > overflow/underflow issues. Nicely done! > > Tested-by: Suren Baghdasaryan <surenb@google.com> > > On Fri, Sep 7, 2018 at 8:09 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: >> On Fri, Sep 07, 2018 at 01:04:07PM +0200, Peter Zijlstra wrote: >>> So yeah, grudingly acked. Did you want me to pick this up through the >>> scheduler tree since most of this lives there? >> Thanks for the ack. >> >> As for routing it, I'll leave that decision to you and Andrew. It >> touches stuff all over, so it could result in quite a few conflicts >> between trees (although I don't expect any of them to be non-trivial).
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail?
Hi Daniel, On Sun, Sep 16, 2018 at 10:22 PM, Daniel Drake <drake@endlessm.com> wrote: > Hi Suren > > On Fri, Sep 7, 2018 at 11:58 PM, Suren Baghdasaryan <surenb@google.com> wrote: >> Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8 >> code system running Android. Signals behave as expected reacting to >> memory pressure, no jumps in "total" counters that would indicate an >> overflow/underflow issues. Nicely done! > > Can you share your Linux v4.9 psi backport somewhere? > Absolutely. Let me figure out what's the best way to do share that and make sure they apply cleanly on official 4.9 (I was using vendor's tree for testing). Will need a day or so to get this done. In case you need them sooner, there were several "prerequisite" patches that I had to backport to make PSI backporting easier/possible. Following is the list as shown by "git log --oneline": PSI patches: ef94c067f360 psi: cgroup support 60081a7aeb0b psi: pressure stall information for CPU, memory, and IO acd2a16497e9 sched: introduce this_rq_lock_irq() f30268c29309 sched: sched.h: make rq locking and clock functions available in stats.h a2fd1c94b743 sched: loadavg: make calc_load_n() public 32a74dec4967 sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD 8e3991dd1a73 delayacct: track delays from thrashing cache pages 4ae940e7e6ff mm: workingset: tell cache transitions from workingset thrashing e9ccd63399e0 mm: workingset: don't drop refault information prematurely Prerequisites: b5a58c778c54 workqueue: make workqueue available early during boot ae5f39ee13b5 sched/core: Add wrappers for lockdep_(un)pin_lock() 7276f98a72c1 sched/headers, delayacct: Move the 'struct task_delay_info' definition from <linux/sched.h> to <linux/delayacct.h> 287318d13688 mm: add PageWaiters indicating tasks are waiting for a page bit edfa64560aaa sched/headers: Remove <linux/sched.h> from <linux/sched/loadavg.h> f6b6ba853959 sched/headers: Move loadavg related definitions from <linux/sched.h> to <linux/sched/loadavg.h> 395b0a9f7aae sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h> PSI patches needed some adjustments but nothing really major. > Thanks > Daniel Thanks, Suren.
On Mon, Sep 17, 2018 at 6:29 AM, peter enderborg <peter.enderborg@sony.com> wrote: > Will it be part of the backport to 4.9 google android or is it for test only? Currently I'm testing these patches in tandem with PSI monitor that I'm developing and test results look good. If things go well and we start using PSI for Android I will try to upstream the backport. If upstream rejects it we will have to merge it into Android common kernel repo as a last resort. Hope this answers your question. > I guess that this patch is to big for the LTS tree. > > On 09/07/2018 05:58 PM, Suren Baghdasaryan wrote: >> Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8 >> code system running Android. Signals behave as expected reacting to >> memory pressure, no jumps in "total" counters that would indicate an >> overflow/underflow issues. Nicely done! >> >> Tested-by: Suren Baghdasaryan <surenb@google.com> >> >> On Fri, Sep 7, 2018 at 8:09 AM, Johannes Weiner <hannes@cmpxchg.org> wrote: >>> On Fri, Sep 07, 2018 at 01:04:07PM +0200, Peter Zijlstra wrote: >>>> So yeah, grudingly acked. Did you want me to pick this up through the >>>> scheduler tree since most of this lives there? >>> Thanks for the ack. >>> >>> As for routing it, I'll leave that decision to you and Andrew. It >>> touches stuff all over, so it could result in quite a few conflicts >>> between trees (although I don't expect any of them to be non-trivial). > > Thanks, Suren.
I emailed Daniel 4.9 backport patches. Unfortunately that seems to be the easiest way to share them. If anyone else is interested in them please email me directly. Thanks, Suren. On Tue, Sep 18, 2018 at 8:53 AM, Suren Baghdasaryan <surenb@google.com> wrote: > Hi Daniel, > > On Sun, Sep 16, 2018 at 10:22 PM, Daniel Drake <drake@endlessm.com> wrote: >> Hi Suren >> >> On Fri, Sep 7, 2018 at 11:58 PM, Suren Baghdasaryan <surenb@google.com> wrote: >>> Thanks for the new patchset! Backported to 4.9 and retested on ARMv8 8 >>> code system running Android. Signals behave as expected reacting to >>> memory pressure, no jumps in "total" counters that would indicate an >>> overflow/underflow issues. Nicely done! >> >> Can you share your Linux v4.9 psi backport somewhere? >> > > Absolutely. Let me figure out what's the best way to do share that and > make sure they apply cleanly on official 4.9 (I was using vendor's > tree for testing). Will need a day or so to get this done. > In case you need them sooner, there were several "prerequisite" > patches that I had to backport to make PSI backporting > easier/possible. Following is the list as shown by "git log > --oneline": > > PSI patches: > > ef94c067f360 psi: cgroup support > 60081a7aeb0b psi: pressure stall information for CPU, memory, and IO > acd2a16497e9 sched: introduce this_rq_lock_irq() > f30268c29309 sched: sched.h: make rq locking and clock functions > available in stats.h > a2fd1c94b743 sched: loadavg: make calc_load_n() public > 32a74dec4967 sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD > 8e3991dd1a73 delayacct: track delays from thrashing cache pages > 4ae940e7e6ff mm: workingset: tell cache transitions from workingset thrashing > e9ccd63399e0 mm: workingset: don't drop refault information prematurely > > Prerequisites: > > b5a58c778c54 workqueue: make workqueue available early during boot > ae5f39ee13b5 sched/core: Add wrappers for lockdep_(un)pin_lock() > 7276f98a72c1 sched/headers, delayacct: Move the 'struct > task_delay_info' definition from <linux/sched.h> to > <linux/delayacct.h> > 287318d13688 mm: add PageWaiters indicating tasks are waiting for a page bit > edfa64560aaa sched/headers: Remove <linux/sched.h> from <linux/sched/loadavg.h> > f6b6ba853959 sched/headers: Move loadavg related definitions from > <linux/sched.h> to <linux/sched/loadavg.h> > 395b0a9f7aae sched/headers: Prepare for new header dependencies before > moving code to <linux/sched/loadavg.h> > > PSI patches needed some adjustments but nothing really major. > >> Thanks >> Daniel > > Thanks, > Suren.
On Tue, 28 Aug 2018 13:22:49 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > This version 4 of the PSI series incorporates feedback from Peter and > fixes two races in the lockless aggregator that Suren found in his > testing and which caused the sample calculation to sometimes underflow > and record bogusly large samples; details at the bottom of this email. We've had very little in the way of review activity for the PSI patchset. According to the changelog tags, anyway.
On Thu, Oct 18, 2018 at 07:07:10PM -0700, Andrew Morton wrote: > On Tue, 28 Aug 2018 13:22:49 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > This version 4 of the PSI series incorporates feedback from Peter and > > fixes two races in the lockless aggregator that Suren found in his > > testing and which caused the sample calculation to sometimes underflow > > and record bogusly large samples; details at the bottom of this email. > > We've had very little in the way of review activity for the PSI > patchset. According to the changelog tags, anyway. Peter reviewed it quite extensively over all revisions, and acked the final version. Peter, can we add your acked-by or reviewed-by tag(s)? The scheduler part accounts for 99% of the complexity in those patches. The mm bits, while somewhat sprawling, are mostly mechanical.
On Tue, Oct 23, 2018 at 01:29:37PM -0400, Johannes Weiner wrote: > On Thu, Oct 18, 2018 at 07:07:10PM -0700, Andrew Morton wrote: > > On Tue, 28 Aug 2018 13:22:49 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > > This version 4 of the PSI series incorporates feedback from Peter and > > > fixes two races in the lockless aggregator that Suren found in his > > > testing and which caused the sample calculation to sometimes underflow > > > and record bogusly large samples; details at the bottom of this email. > > > > We've had very little in the way of review activity for the PSI > > patchset. According to the changelog tags, anyway. > > Peter reviewed it quite extensively over all revisions, and acked the > final version. Peter, can we add your acked-by or reviewed-by tag(s)? I don't really do reviewed by; but yes, I thought I already did; lemme find. > The scheduler part accounts for 99% of the complexity in those > patches. The mm bits, while somewhat sprawling, are mostly mechanical. Ah, I now see my mistake; https://lkml.kernel.org/r/20180907110407.GQ24106@hirez.programming.kicks-ass.net I forgot to include an actual tag therein. My bad. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>