[0/2] hung_task: add detect count for hung tasks

Message ID	20241022114736.83285-1-ioworker0@gmail.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Lance Yang <ioworker0@gmail.com> To: akpm@linux-foundation.org Cc: cunhuang@tencent.com, leonylgao@tencent.com, j.granados@samsung.com, jsiddle@redhat.com, kent.overstreet@linux.dev, 21cnbao@gmail.com, ryan.roberts@arm.com, david@redhat.com, ziy@nvidia.com, libang.li@antgroup.com, baolin.wang@linux.alibaba.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Lance Yang <ioworker0@gmail.com> Subject: [PATCH 0/2] hung_task: add detect count for hung tasks Date: Tue, 22 Oct 2024 19:47:34 +0800 Message-ID: <20241022114736.83285-1-ioworker0@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	hung_task: add detect count for hung tasks \| expand [0/2] hung_task: add detect count for hung tasks [1/2] hung_task: add detect count for hung tasks [2/2] hung_task: add docs for hung_task_detect_count

Message ID

20241022114736.83285-1-ioworker0@gmail.com (mailing list archive)

Headers

From: Lance Yang <ioworker0@gmail.com>
To: akpm@linux-foundation.org
Cc: cunhuang@tencent.com,
	leonylgao@tencent.com,
	j.granados@samsung.com,
	jsiddle@redhat.com,
	kent.overstreet@linux.dev,
	21cnbao@gmail.com,
	ryan.roberts@arm.com,
	david@redhat.com,
	ziy@nvidia.com,
	libang.li@antgroup.com,
	baolin.wang@linux.alibaba.com,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	Lance Yang <ioworker0@gmail.com>
Subject: [PATCH 0/2] hung_task: add detect count for hung tasks
Date: Tue, 22 Oct 2024 19:47:34 +0800
Message-ID: <20241022114736.83285-1-ioworker0@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

hung_task: add detect count for hung tasks | expand

Message

Lance Yang Oct. 22, 2024, 11:47 a.m. UTC

Hi all,

This patchset adds a counter, hung_task_detect_count, to track the number of
times hung tasks are detected. This counter provides a straightforward way
to monitor hung task events without manually checking dmesg logs.

With this counter in place, system issues can be spotted quickly, allowing
admins to step in promptly before system load spikes occur, even if the
hung_task_warnings value has been decreased to 0 well before.

Recently, we encountered a situation where warnings about hung tasks were
buried in dmesg logs during load spikes. Introducing this counter could
have helped us detect such issues earlier and improve our analysis efficiency.


Lance Yang (2):
  hung_task: add detect count for hung tasks
  hung_task: add docs for hung_task_detect_count

 Documentation/admin-guide/sysctl/kernel.rst |  9 +++++++++
 kernel/hung_task.c                          | 18 ++++++++++++++++++
 2 files changed, 27 insertions(+)

Comments

Andrew Morton Oct. 24, 2024, 2:05 a.m. UTC | #1

On Tue, 22 Oct 2024 19:47:34 +0800 Lance Yang <ioworker0@gmail.com> wrote:

> Hi all,
> 
> This patchset adds a counter, hung_task_detect_count, to track the number of
> times hung tasks are detected. This counter provides a straightforward way
> to monitor hung task events without manually checking dmesg logs.
> 
> With this counter in place, system issues can be spotted quickly, allowing
> admins to step in promptly before system load spikes occur, even if the
> hung_task_warnings value has been decreased to 0 well before.
> 
> Recently, we encountered a situation where warnings about hung tasks were
> buried in dmesg logs during load spikes. Introducing this counter could
> have helped us detect such issues earlier and improve our analysis efficiency.
> 

Isn't the answer to this problem "write a better parser"?  I mean,
we're providing userspace with information which is already available.

Lance Yang Oct. 24, 2024, 3:28 a.m. UTC | #2

Hi Andrew,

Thanks a lot for paying attention!

On Thu, Oct 24, 2024 at 10:05 AM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Tue, 22 Oct 2024 19:47:34 +0800 Lance Yang <ioworker0@gmail.com> wrote:
>
> > Hi all,
> >
> > This patchset adds a counter, hung_task_detect_count, to track the number of
> > times hung tasks are detected. This counter provides a straightforward way
> > to monitor hung task events without manually checking dmesg logs.
> >
> > With this counter in place, system issues can be spotted quickly, allowing
> > admins to step in promptly before system load spikes occur, even if the
> > hung_task_warnings value has been decreased to 0 well before.
> >
> > Recently, we encountered a situation where warnings about hung tasks were
> > buried in dmesg logs during load spikes. Introducing this counter could
> > have helped us detect such issues earlier and improve our analysis efficiency.
> >
>
> Isn't the answer to this problem "write a better parser"?  I mean,

Yeah, I certainly agree that having a good parser is important, and I'm
working on that as well ;)

> we're providing userspace with information which is already available.

IHMO, there are two reasons why this counter remains valuable:

1) It allows us to easily detect hung tasks in time before load spikes occur,
using simple and common monitoring tools like Prometheus.

2) It ensures that we remain aware of hung tasks even when the
hung_task_warnings value has already been decreased to 0 well before.

Thanks again for your time!
Lance

>

Andrew Morton Oct. 24, 2024, 4:28 a.m. UTC | #3

On Thu, 24 Oct 2024 11:28:01 +0800 Lance Yang <ioworker0@gmail.com> wrote:

> Hi Andrew,
> 
> Thanks a lot for paying attention!
> 
> On Thu, Oct 24, 2024 at 10:05 AM Andrew Morton
> <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 22 Oct 2024 19:47:34 +0800 Lance Yang <ioworker0@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > This patchset adds a counter, hung_task_detect_count, to track the number of
> > > times hung tasks are detected. This counter provides a straightforward way
> > > to monitor hung task events without manually checking dmesg logs.
> > >
> > > With this counter in place, system issues can be spotted quickly, allowing
> > > admins to step in promptly before system load spikes occur, even if the
> > > hung_task_warnings value has been decreased to 0 well before.
> > >
> > > Recently, we encountered a situation where warnings about hung tasks were
> > > buried in dmesg logs during load spikes. Introducing this counter could
> > > have helped us detect such issues earlier and improve our analysis efficiency.
> > >
> >
> > Isn't the answer to this problem "write a better parser"?  I mean,
> 
> Yeah, I certainly agree that having a good parser is important, and I'm
> working on that as well ;)
> 
> > we're providing userspace with information which is already available.
> 
> IHMO, there are two reasons why this counter remains valuable:
> 
> 1) It allows us to easily detect hung tasks in time before load spikes occur,
> using simple and common monitoring tools like Prometheus.

But the new sysctl_hung_task_detect_count counter gets incremented a
microsecond before the printk comes out.  I don't understand the
difference.

> 2) It ensures that we remain aware of hung tasks even when the
> hung_task_warnings value has already been decreased to 0 well before.

That makes sense, I guess.  But fleshing this out with a real
operational scenario would help persuade reviewers of the benefit of
this change.

So please describe the utility with full details - sell it to us!

Lance Yang Oct. 24, 2024, 8:48 a.m. UTC | #4

On Thu, Oct 24, 2024 at 12:28 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Thu, 24 Oct 2024 11:28:01 +0800 Lance Yang <ioworker0@gmail.com> wrote:
>
> > Hi Andrew,
> >
> > Thanks a lot for paying attention!
> >
> > On Thu, Oct 24, 2024 at 10:05 AM Andrew Morton
> > <akpm@linux-foundation.org> wrote:
> > >
> > > On Tue, 22 Oct 2024 19:47:34 +0800 Lance Yang <ioworker0@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > This patchset adds a counter, hung_task_detect_count, to track the number of
> > > > times hung tasks are detected. This counter provides a straightforward way
> > > > to monitor hung task events without manually checking dmesg logs.
> > > >
> > > > With this counter in place, system issues can be spotted quickly, allowing
> > > > admins to step in promptly before system load spikes occur, even if the
> > > > hung_task_warnings value has been decreased to 0 well before.
> > > >
> > > > Recently, we encountered a situation where warnings about hung tasks were
> > > > buried in dmesg logs during load spikes. Introducing this counter could
> > > > have helped us detect such issues earlier and improve our analysis efficiency.
> > > >
> > >
> > > Isn't the answer to this problem "write a better parser"?  I mean,
> >
> > Yeah, I certainly agree that having a good parser is important, and I'm
> > working on that as well ;)
> >
> > > we're providing userspace with information which is already available.
> >
> > IHMO, there are two reasons why this counter remains valuable:
> >
> > 1) It allows us to easily detect hung tasks in time before load spikes occur,
> > using simple and common monitoring tools like Prometheus.
>
> But the new sysctl_hung_task_detect_count counter gets incremented a
> microsecond before the printk comes out.  I don't understand the
> difference.
>
> > 2) It ensures that we remain aware of hung tasks even when the
> > hung_task_warnings value has already been decreased to 0 well before.
>
> That makes sense, I guess.  But fleshing this out with a real
> operational scenario would help persuade reviewers of the benefit of
> this change.
>
> So please describe the utility with full details - sell it to us!

Thanks, the suggestion is very helpful!

IHMO, hung tasks are a critical metric. Currently, we detect them by
periodically parsing dmesg. However, this method isn't as user-friendly
as using a counter.

Sometimes, a short-lived issue with the NIC or hard drive can quickly
decrease the hung_task_warnings to zero. Without warnings, we must
directly access the node to ensure that there are no more hung tasks
and that the system has recovered. After all, load alone cannot provide
a clear picture.

Once this counter is in place, in a high-density deployment pattern, we plan
to set hung_task_timeout_secs to a lower number to improve stability, even
though this might result in false positives. And then we can set a time-based
threshold: if hung tasks last beyond this duration, we will automatically
migrate containers to other nodes. Based on past experience, this approach
could help avoid many production disruptions.

Moreover, just like other important events such as OOM that already have
counters, having a dedicated counter for hung tasks makes sense ;)

Thanks,
Lance