mbox series

[v9,0/3] Enhance sysfs handling for memory hotplug in weighted interleave

Message ID 20250417072839.711-1-rakie.kim@sk.com (mailing list archive)
Headers show
Series Enhance sysfs handling for memory hotplug in weighted interleave | expand

Message

Rakie Kim April 17, 2025, 7:28 a.m. UTC
The following patch series enhances the weighted interleave policy in the
memory management subsystem by improving sysfs handling, fixing memory leaks,
and introducing dynamic sysfs updates for memory hotplug support.

Changes from v8:
https://lore.kernel.org/all/20250416113123.629-1-rakie.kim@sk.com/
- Updated lock usage during sysfs entry creation
- Fixed sysfs removal warning triggered during exception handling paths

Changes from v7:
https://lore.kernel.org/all/20250408073243.488-1-rakie.kim@sk.com/
- Refactored error handling paths to remove unnecessary `goto` usage
- Renamed unclear variables and functions for better readability

Changes from v6:
https://lore.kernel.org/all/20250404074623.1179-1-rakie.kim@sk.com/
- Removed redundant error handling in MEM_OFFLINE case

Changes from v5:
https://lore.kernel.org/all/20250402014906.1086-1-rakie.kim@sk.com/
- Added lock protection to sensitive sections

Changes from v4:
https://lore.kernel.org/all/20250401090901.1050-1-rakie.kim@sk.com/
- Added missing `kobject_del()` when removing a kobject
- Extended locking coverage to protect shared state

Changes from v3:
https://lore.kernel.org/all/20250320041749.881-1-rakie.kim@sk.com/
- Added error handling for allocation and cleanup paths
- Replaced static node attribute list with flexible array
- Reorganized four patches into three based on their functional scope

Changes from v2:
https://lore.kernel.org/all/20250312075628.648-1-rakie.kim@sk.com/
- Clarified commit messages
- Refactored to avoid passing the global structure as a parameter

Changes from v1:
https://lore.kernel.org/all/20250307063534.540-1-rakie.kim@sk.com/
- Fixed memory leaks related to `kobject` lifecycle
- Refactored sysfs layout for hotplug readiness
- Introduced initial memory hotplug support for weighted interleave

### Introduction
The weighted interleave policy distributes memory allocations across multiple
NUMA nodes based on their performance weight, thereby optimizing memory
bandwidth utilization. The weight values are configured through sysfs.

Previously, sysfs entries for weighted interleave were managed statically
at initialization. This led to several issues:
- Memory Leaks: Improper `kobject` teardown caused memory leaks
  when initialization failed or when nodes were removed.
- Lack of Dynamic Updates: Sysfs attributes were created only during
  initialization, preventing nodes added at runtime from being recognized.
- Handling of Unusable Nodes: Sysfs entries were generated for all
  possible nodes (`N_POSSIBLE`), including memoryless or unavailable nodes,
  leading to sysfs entries for unusable nodes and potential
  misconfigurations.

### Patch Overview
1. [PATCH 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs
   - Ensures proper cleanup of `kobject` allocations.
   - Adds `kobject_del()` before `kobject_put()` to clean up sysfs state correctly.
   - Prevents memory/resource leaks and improves teardown behavior.
   - Ensures that `sysfs_remove_file()` is not called from the release path
     after `kobject_del()` has cleared sysfs state, to avoid potential
     inconsistencies and warnings in the kernfs subsystem.

2. [PATCH 2/3] mm/mempolicy: Prepare weighted interleave sysfs for memory hotplug
   - Refactors static sysfs layout into a new `sysfs_wi_group` structure.
   - Makes per-node sysfs attributes accessible to external modules.
   - Lays groundwork for future hotplug support by enabling runtime modification.

3. [PATCH 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
   - Dynamically adds/removes sysfs entries when nodes are online/offline.
   - Limits sysfs creation to nodes with memory, avoiding unusable node entries.
   - Hooks into memory hotplug notifier for runtime updates.

These patches have been tested under CXL-based memory configurations,
including hotplug scenarios, to ensure proper behavior and stability.

 mm/mempolicy.c | 240 ++++++++++++++++++++++++++++++-------------------
 1 file changed, 148 insertions(+), 92 deletions(-)


base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444

Comments

Rakie Kim April 17, 2025, 8:10 a.m. UTC | #1
On Thu, 17 Apr 2025 16:28:34 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> These patches have been tested under CXL-based memory configurations,
> including hotplug scenarios, to ensure proper behavior and stability.
> 
>  mm/mempolicy.c | 240 ++++++++++++++++++++++++++++++-------------------
>  1 file changed, 148 insertions(+), 92 deletions(-)
> 
> base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
> -- 

To Andrew

I sincerely apologize for causing repeated inconvenience. The series of
patches version v8 that was merged into -mm, mm-new today needs
additional corrections.
Link: https://lore.kernel.org/all/6800742de6315_130fd2949c@dwillia2-mobl3.amr.corp.intel.com.notmuch/
Therefore, I have updated a new version v9, in which the problems have
been addressed.

I would greatly appreciate it if you could drop the existing v8 and
replace it with the new version v9. Below is the information regarding
the v8 patch series that needs to be dropped.
Once again, I truly apologize for the inconvenience.

<1>
The patch titled:
     Subject: mm/mempolicy: fix memory leaks in weighted interleave sysfs
Filename:
     mm-mempolicy-fix-memory-leaks-in-weighted-interleave-sysfs.patch

<2>
The patch titled:
     Subject: mm/mempolicy: prepare weighted interleave sysfs for memory hotplug
Filename:
     mm-mempolicy-prepare-weighted-interleave-sysfs-for-memory-hotplug.patch

<3>
The patch titled:
     Subject: mm/mempolicy: support memory hotplug in weighted interleave
Filename:
     mm-mempolicy-support-memory-hotplug-in-weighted-interleave.patch

Rakie
Andrew Morton April 17, 2025, 10:35 p.m. UTC | #2
On Thu, 17 Apr 2025 17:10:08 +0900 Rakie Kim <rakie.kim@sk.com> wrote:

> I sincerely apologize for causing repeated inconvenience. The series of
> patches version v8 that was merged into -mm, mm-new today needs
> additional corrections.
> Link: https://lore.kernel.org/all/6800742de6315_130fd2949c@dwillia2-mobl3.amr.corp.intel.com.notmuch/
> Therefore, I have updated a new version v9, in which the problems have
> been addressed.

No probs, this is why mm.git workflow (mm-new -> mm-unstable ->
mm-stable -> mainline) operates as it does - to easily permit revisions
and replacements as patches move towards their final state.

Please note that I added a cc:stable to your [1/N] patch - sysfs leaks
should be fixed in earlier kernels.  I considered this to be low
priority - if it's higher priority than this patch should best have
been separated from the series, so it can take a different merge path
from the other patches.
Dan Williams April 17, 2025, 10:41 p.m. UTC | #3
Andrew Morton wrote:
> On Thu, 17 Apr 2025 17:10:08 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> 
> > I sincerely apologize for causing repeated inconvenience. The series of
> > patches version v8 that was merged into -mm, mm-new today needs
> > additional corrections.
> > Link: https://lore.kernel.org/all/6800742de6315_130fd2949c@dwillia2-mobl3.amr.corp.intel.com.notmuch/
> > Therefore, I have updated a new version v9, in which the problems have
> > been addressed.
> 
> No probs, this is why mm.git workflow (mm-new -> mm-unstable ->
> mm-stable -> mainline) operates as it does - to easily permit revisions
> and replacements as patches move towards their final state.
> 
> Please note that I added a cc:stable to your [1/N] patch - sysfs leaks
> should be fixed in earlier kernels.  I considered this to be low
> priority - if it's higher priority than this patch should best have
> been separated from the series, so it can take a different merge path
> from the other patches.  

The risk of leak is low because it only appears to trigger if setup
fails. Setup only fails due to -ENOMEM which is unlikely to happen from
a late_initcall() when memory pressure is low.
Dan Williams April 17, 2025, 10:56 p.m. UTC | #4
Andrew Morton wrote:
> On Thu, 17 Apr 2025 17:10:08 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> 
> > I sincerely apologize for causing repeated inconvenience. The series of
> > patches version v8 that was merged into -mm, mm-new today needs
> > additional corrections.
> > Link: https://lore.kernel.org/all/6800742de6315_130fd2949c@dwillia2-mobl3.amr.corp.intel.com.notmuch/
> > Therefore, I have updated a new version v9, in which the problems have
> > been addressed.
> 
> No probs, this is why mm.git workflow (mm-new -> mm-unstable ->
> mm-stable -> mainline) operates as it does - to easily permit revisions
> and replacements as patches move towards their final state.

Perhaps something like the update below to the mm-commits@ notification
template, because I think folks misread early acceptance into mm-new as
"Hooray, I'm done!" instead of "Ok, now more review might be inbound
because Andrew has put this patch on his radar, but I am not done until
this hits mm-stable".

---
--- template.orig	2025-04-17 15:48:49.913337459 -0700
+++ template	2025-04-17 15:51:57.294617179 -0700
@@ -1,10 +1,16 @@
 The patch titled
      Subject: $subject
 has been added to the -mm mm-new branch.  Its filename is
      $patch_name
 
 This patch will shortly appear at
      https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/$patch_name
 
 This patch will later appear in the mm-new branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+
+Note, mm-new is a provisional staging ground for work-in-progress
+patches, and acceptance into mm-new is a notification for others take
+notice and finish up reviews. Please do not hesitate to respond to
+review feedback and post updated versions to replace or incrementally
+fixup patches in mm-new.
---

...otherwise, documentation like that would be good fodder for a mm
"maintainer profile" [1].

[1]: https://docs.kernel.org/process/maintainer-handbooks.html
Andrew Morton April 17, 2025, 11:31 p.m. UTC | #5
On Thu, 17 Apr 2025 15:41:30 -0700 Dan Williams <dan.j.williams@intel.com> wrote:

> Andrew Morton wrote:
> > On Thu, 17 Apr 2025 17:10:08 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> > 
> > > I sincerely apologize for causing repeated inconvenience. The series of
> > > patches version v8 that was merged into -mm, mm-new today needs
> > > additional corrections.
> > > Link: https://lore.kernel.org/all/6800742de6315_130fd2949c@dwillia2-mobl3.amr.corp.intel.com.notmuch/
> > > Therefore, I have updated a new version v9, in which the problems have
> > > been addressed.
> > 
> > No probs, this is why mm.git workflow (mm-new -> mm-unstable ->
> > mm-stable -> mainline) operates as it does - to easily permit revisions
> > and replacements as patches move towards their final state.
> > 
> > Please note that I added a cc:stable to your [1/N] patch - sysfs leaks
> > should be fixed in earlier kernels.  I considered this to be low
> > priority - if it's higher priority than this patch should best have
> > been separated from the series, so it can take a different merge path
> > from the other patches.  
> 
> The risk of leak is low because it only appears to trigger if setup
> fails. Setup only fails due to -ENOMEM which is unlikely to happen from
> a late_initcall() when memory pressure is low.

Oh, OK, thanks.  I added the above paragraph to the changelog and
removed the cc:stable.

Generally, we assume that -ENOMEM doesn't happen in __init code.  If it
does, the kernel is totally messed up anyway)