diff mbox series

[v1,1/5] mm/memory_hotplug: check for fatal signals only in offline_pages()

Message ID 20230627112220.229240-2-david@redhat.com (mailing list archive)
State New
Headers show
Series mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals | expand

Commit Message

David Hildenbrand June 27, 2023, 11:22 a.m. UTC
Let's check for fatal signals only. That looks cleaner and still keeps
the documented use case for manual user-space triggered memory offlining
working. From Documentation/admin-guide/mm/memory-hotplug.rst:

	% timeout $TIMEOUT offline_block | failure_handling

In fact, we even document there: "the offlining context can be terminated
by sending a fatal signal".

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michal Hocko June 27, 2023, 12:34 p.m. UTC | #1
On Tue 27-06-23 13:22:16, David Hildenbrand wrote:
> Let's check for fatal signals only. That looks cleaner and still keeps
> the documented use case for manual user-space triggered memory offlining
> working. From Documentation/admin-guide/mm/memory-hotplug.rst:
> 
> 	% timeout $TIMEOUT offline_block | failure_handling
> 
> In fact, we even document there: "the offlining context can be terminated
> by sending a fatal signal".

We should be fixing documentation instead. This could break users who do
have a SIGALRM signal hander installed.

> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/memory_hotplug.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 8e0fa209d533..0d2151df4ee1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	do {
>  		pfn = start_pfn;
>  		do {
> -			if (signal_pending(current)) {
> +			if (fatal_signal_pending(current)) {
>  				ret = -EINTR;
>  				reason = "signal backoff";
>  				goto failed_removal_isolated;
> -- 
> 2.40.1
David Hildenbrand June 27, 2023, 1:28 p.m. UTC | #2
On 27.06.23 14:34, Michal Hocko wrote:
> On Tue 27-06-23 13:22:16, David Hildenbrand wrote:
>> Let's check for fatal signals only. That looks cleaner and still keeps
>> the documented use case for manual user-space triggered memory offlining
>> working. From Documentation/admin-guide/mm/memory-hotplug.rst:
>>
>> 	% timeout $TIMEOUT offline_block | failure_handling
>>
>> In fact, we even document there: "the offlining context can be terminated
>> by sending a fatal signal".
> 
> We should be fixing documentation instead. This could break users who do
> have a SIGALRM signal hander installed.

You mean because timeout will send a SIGALRM, which is not considered 
fatal in case a signal handler is installed?

At least the "traditional" tools I am aware of don't set a timeout at 
all (crossing fingers that they never end up stuck):
* chmem
* QEMU guest agent
* powerpc-utils

libdaxctl also doesn't seem to implement an easy-to-spot timeout for 
memory offlining, but it also doesn't configure SIGALRM.


Of course, that doesn't mean that there isn't somewhere a program that 
does that; I merely assume that it would be pretty unlikely to find such 
a program.

But no strong opinion: we can also keep it like that, update the doc and 
add a comment why this one here is different than most other signal 
backoff checks.


Thanks!
Michal Hocko June 27, 2023, 2:07 p.m. UTC | #3
On Tue 27-06-23 15:28:29, David Hildenbrand wrote:
> On 27.06.23 14:34, Michal Hocko wrote:
> > On Tue 27-06-23 13:22:16, David Hildenbrand wrote:
> > > Let's check for fatal signals only. That looks cleaner and still keeps
> > > the documented use case for manual user-space triggered memory offlining
> > > working. From Documentation/admin-guide/mm/memory-hotplug.rst:
> > > 
> > > 	% timeout $TIMEOUT offline_block | failure_handling
> > > 
> > > In fact, we even document there: "the offlining context can be terminated
> > > by sending a fatal signal".
> > 
> > We should be fixing documentation instead. This could break users who do
> > have a SIGALRM signal hander installed.
> 
> You mean because timeout will send a SIGALRM, which is not considered fatal
> in case a signal handler is installed?

Correct.

> At least the "traditional" tools I am aware of don't set a timeout at all
> (crossing fingers that they never end up stuck):
> * chmem
> * QEMU guest agent
> * powerpc-utils
> 
> libdaxctl also doesn't seem to implement an easy-to-spot timeout for memory
> offlining, but it also doesn't configure SIGALRM.
> 
> 
> Of course, that doesn't mean that there isn't somewhere a program that does
> that; I merely assume that it would be pretty unlikely to find such a
> program.
> 
> But no strong opinion: we can also keep it like that, update the doc and add
> a comment why this one here is different than most other signal backoff
> checks.

Well, the existing signal handling approach is there for way too long to
be sure. I personally would prefer fatal_signal_pending as that reflects
more what we do elsewhere but here we are. Historical baggage...
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8e0fa209d533..0d2151df4ee1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1879,7 +1879,7 @@  int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	do {
 		pfn = start_pfn;
 		do {
-			if (signal_pending(current)) {
+			if (fatal_signal_pending(current)) {
 				ret = -EINTR;
 				reason = "signal backoff";
 				goto failed_removal_isolated;