Message ID | 20230627112220.229240-2-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/memory_hotplug: make offline_and_remove_memory() timeout instead of failing on fatal signals | expand |
On Tue 27-06-23 13:22:16, David Hildenbrand wrote: > Let's check for fatal signals only. That looks cleaner and still keeps > the documented use case for manual user-space triggered memory offlining > working. From Documentation/admin-guide/mm/memory-hotplug.rst: > > % timeout $TIMEOUT offline_block | failure_handling > > In fact, we even document there: "the offlining context can be terminated > by sending a fatal signal". We should be fixing documentation instead. This could break users who do have a SIGALRM signal hander installed. > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/memory_hotplug.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 8e0fa209d533..0d2151df4ee1 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, > do { > pfn = start_pfn; > do { > - if (signal_pending(current)) { > + if (fatal_signal_pending(current)) { > ret = -EINTR; > reason = "signal backoff"; > goto failed_removal_isolated; > -- > 2.40.1
On 27.06.23 14:34, Michal Hocko wrote: > On Tue 27-06-23 13:22:16, David Hildenbrand wrote: >> Let's check for fatal signals only. That looks cleaner and still keeps >> the documented use case for manual user-space triggered memory offlining >> working. From Documentation/admin-guide/mm/memory-hotplug.rst: >> >> % timeout $TIMEOUT offline_block | failure_handling >> >> In fact, we even document there: "the offlining context can be terminated >> by sending a fatal signal". > > We should be fixing documentation instead. This could break users who do > have a SIGALRM signal hander installed. You mean because timeout will send a SIGALRM, which is not considered fatal in case a signal handler is installed? At least the "traditional" tools I am aware of don't set a timeout at all (crossing fingers that they never end up stuck): * chmem * QEMU guest agent * powerpc-utils libdaxctl also doesn't seem to implement an easy-to-spot timeout for memory offlining, but it also doesn't configure SIGALRM. Of course, that doesn't mean that there isn't somewhere a program that does that; I merely assume that it would be pretty unlikely to find such a program. But no strong opinion: we can also keep it like that, update the doc and add a comment why this one here is different than most other signal backoff checks. Thanks!
On Tue 27-06-23 15:28:29, David Hildenbrand wrote: > On 27.06.23 14:34, Michal Hocko wrote: > > On Tue 27-06-23 13:22:16, David Hildenbrand wrote: > > > Let's check for fatal signals only. That looks cleaner and still keeps > > > the documented use case for manual user-space triggered memory offlining > > > working. From Documentation/admin-guide/mm/memory-hotplug.rst: > > > > > > % timeout $TIMEOUT offline_block | failure_handling > > > > > > In fact, we even document there: "the offlining context can be terminated > > > by sending a fatal signal". > > > > We should be fixing documentation instead. This could break users who do > > have a SIGALRM signal hander installed. > > You mean because timeout will send a SIGALRM, which is not considered fatal > in case a signal handler is installed? Correct. > At least the "traditional" tools I am aware of don't set a timeout at all > (crossing fingers that they never end up stuck): > * chmem > * QEMU guest agent > * powerpc-utils > > libdaxctl also doesn't seem to implement an easy-to-spot timeout for memory > offlining, but it also doesn't configure SIGALRM. > > > Of course, that doesn't mean that there isn't somewhere a program that does > that; I merely assume that it would be pretty unlikely to find such a > program. > > But no strong opinion: we can also keep it like that, update the doc and add > a comment why this one here is different than most other signal backoff > checks. Well, the existing signal handling approach is there for way too long to be sure. I personally would prefer fatal_signal_pending as that reflects more what we do elsewhere but here we are. Historical baggage...
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 8e0fa209d533..0d2151df4ee1 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1879,7 +1879,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages, do { pfn = start_pfn; do { - if (signal_pending(current)) { + if (fatal_signal_pending(current)) { ret = -EINTR; reason = "signal backoff"; goto failed_removal_isolated;
Let's check for fatal signals only. That looks cleaner and still keeps the documented use case for manual user-space triggered memory offlining working. From Documentation/admin-guide/mm/memory-hotplug.rst: % timeout $TIMEOUT offline_block | failure_handling In fact, we even document there: "the offlining context can be terminated by sending a fatal signal". Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/memory_hotplug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)