mbox series

[RFC,0/4] Add support for File Based Memory Management

Message ID 20241122203830.2381905-1-btabatabai@wisc.edu (mailing list archive)
Headers show
Series Add support for File Based Memory Management | expand

Message

Bijan Tabatabai Nov. 22, 2024, 8:38 p.m. UTC
This patch set implements file based memory management (FBMM) [1], a
research project from the University of Wisconsin-Madison where a process's
memory can be transparently managed by memory managers which are written as
filesystems. When using FBMM, instead of using the traditional anonymous
memory path, a process's memory is managed by mapping files from a memory
management filesystem (MFS) into its address space. The MFS implements the
memory management related callback functions provided by the VFS to
implement the desired memory management functionality. After presenting
this work at a conference, a handful of people asked if we were going to
upstream the work, so we decided to see if the Linux community would be
interested in this functionality as well.

This work is inspired by the increase in heterogeneity in memory hardware,
such as from Optane and CXL. This heterogeneity is leading to a lot of
research involving extending Linux's memory management subsystem. However,
the monolithic design of the memory management subsystem makes it difficult
to extend, and this difficulty grows as the complexity of the subsystem
increases. Others in the research community have identified this problem as
well [2,3]. We believe the kernel would benefit from some sort of extension
interface to more easily prototype and implement memory management
behaviors for a world with more diverse memory hierarchies.

Filesystems are a natural extension mechanism for memory management because
it already exists and memory mapping files into processes works. Also,
precedent exists for writing memory managers as filesystems in the kernel
with HugeTLBFS.

While FBMM is easiest used for research and prototyping, I have also
received feedback from people who work in industry that it would be useful
for them as well. One person I talked to mentioned that they have made
several changes to the memory management system in their branch that are
not upstreamed, and it would be convinient to modularize those changes to
avoid the headaches of rebasing when upgrading the kernel version.

To use FBMM, one would perform the following steps:
1) Mount the MFS(s) they want to use
2) Enable FBMM by writting 1 to /sys/kernel/mm/fbmm/state
3) Set the MFS an application should allocate its memory from by writting
the desired MFS's mount directory to /proc/<pid>/fbmm_mnt_dir, where <pid>
is the PID of the target process.

To have a process use an MFS for the entirety of the execution, one could
use a wrapper program that writes /proc/self/fbmm_mount_dir then calls exec
for the target process. We have created such a wrapper, which can be found
at [4]. ld could also be extended to do this, using an environment variable
similar to LD_PRELOAD.

The first patch in this series adds the core of FBMM, allowing a user to
set the MFS an application should allocate its anonymous memory from,
transparently to the application.

The second patch adds helper functions for common MM functionality that may
be useful to MFS implementors for supporting swapping and handling
fork/copy on write. Because fork is complicated, this patch adds a callback
function to the super_operations struct to allow an MFS to decide its fork
behavior, e.g. allow it to decide to do a deep copy of memory on fork
instead of copy on write, and adds logic to the dup_mmap function to handle
FBMM files.

The third patch exports some kernel functions that are needed to implement
an MFS to allow for MFSs to be written as kernel modules.

The fourth and final patch in this series provides a sample implementation
of a simple MFS, and is not actually intended to be upstreamed.

[1] https://www.usenix.org/conference/atc24/presentation/tabatabai
[2] https://www.usenix.org/conference/atc24/presentation/jalalian
[3] https://www.usenix.org/conference/atc24/presentation/cao
[4] https://github.com/multifacet/fbmm-workspace/blob/main/bmks/fbmm_wrapper.c

Bijan Tabatabai (4):
  mm: Add support for File Based Memory Management
  fbmm: Add helper functions for FBMM MM Filesystems
  mm: Export functions for writing MM Filesystems
  Add base implementation of an MFS

 BasicMFS/Kconfig                |   3 +
 BasicMFS/Makefile               |   8 +
 BasicMFS/basic.c                | 717 ++++++++++++++++++++++++++++++++
 BasicMFS/basic.h                |  29 ++
 arch/x86/include/asm/tlbflush.h |   2 -
 arch/x86/mm/tlb.c               |   1 +
 fs/Kconfig                      |   7 +
 fs/Makefile                     |   1 +
 fs/exec.c                       |   2 +
 fs/file_based_mm.c              | 663 +++++++++++++++++++++++++++++
 fs/proc/base.c                  |   4 +
 include/linux/file_based_mm.h   |  99 +++++
 include/linux/fs.h              |   1 +
 include/linux/mm.h              |  10 +
 include/linux/sched.h           |   4 +
 kernel/exit.c                   |   3 +
 kernel/fork.c                   |  57 ++-
 mm/Makefile                     |   1 +
 mm/fbmm_helpers.c               | 372 +++++++++++++++++
 mm/filemap.c                    |   2 +
 mm/gup.c                        |   1 +
 mm/internal.h                   |  13 +
 mm/memory.c                     |   3 +
 mm/mmap.c                       |  44 +-
 mm/pgtable-generic.c            |   1 +
 mm/rmap.c                       |   2 +
 mm/vmscan.c                     |  14 +-
 27 files changed, 2040 insertions(+), 24 deletions(-)
 create mode 100644 BasicMFS/Kconfig
 create mode 100644 BasicMFS/Makefile
 create mode 100644 BasicMFS/basic.c
 create mode 100644 BasicMFS/basic.h
 create mode 100644 fs/file_based_mm.c
 create mode 100644 include/linux/file_based_mm.h
 create mode 100644 mm/fbmm_helpers.c

Comments

Lorenzo Stoakes Nov. 23, 2024, 12:23 p.m. UTC | #1
+ VMA guys, it's important to run scripts/get_maintainers.pl on your
changes so the right people are pinged :)

On Fri, Nov 22, 2024 at 02:38:26PM -0600, Bijan Tabatabai wrote:
> This patch set implements file based memory management (FBMM) [1], a
> research project from the University of Wisconsin-Madison where a process's
> memory can be transparently managed by memory managers which are written as
> filesystems. When using FBMM, instead of using the traditional anonymous
> memory path, a process's memory is managed by mapping files from a memory
> management filesystem (MFS) into its address space. The MFS implements the
> memory management related callback functions provided by the VFS to
> implement the desired memory management functionality. After presenting
> this work at a conference, a handful of people asked if we were going to
> upstream the work, so we decided to see if the Linux community would be
> interested in this functionality as well.
>

While it's a cool project, I don't think it's upstreamable in its current
form - it essentially bypasses core mm functionality and 'does mm'
somewhere else (which strikes me, in effect, as the entire purpose of the
series).

mm is a subsystem that is in constant flux with many assumptions that one
might make about it being changed, which make it wholly unsuited to having
its functionality exported like this.

So in in effect it, by its nature, has to export internals somewhere else,
and that somewhere else now assumes things about mm that might change at
any point, additionally bypassing a great deal of highly sensitive and
purposeful logic.

This series also adds a lot of if (fbmm) { ... } changes to core logic
which is really not how we want to do things. hugetlbfs does this kind of
thing, but it is more or less universally seen as a _bad thing_ and
something we are trying to refactor.

So any upstreamable form of this would need to a. be part of mm, b. use
existing extensible mechanisms or create them, and c. not have _core_ mm
tasks or activities be performed 'elsewhere'.

Sadly I think the latter part may make a refactoring in this direction
infeasible, as it seems to me this is sort of the point of this.

This also means it's not acceptable to export highly sensitive mm internals
as you do in patch 3/4. Certainly in 1/4, as a co-maintainer of the mmap
logic, I can't accept the changes you suggest to brk() and mmap(), sorry.

There are huge subtleties in much of mm, including very very sensitive lock
mechanisms, and keeping such things within mm means we can have confidence
they work, and that fixes resolve issues.

I hope this isn't too discouraging, the fact you got this functioning is
amazing and as an out-of-tree research and experimentation project it looks
really cool, but for me, I don't think this is for upstream.

Thanks, Lorenzo


> This work is inspired by the increase in heterogeneity in memory hardware,
> such as from Optane and CXL. This heterogeneity is leading to a lot of
> research involving extending Linux's memory management subsystem. However,
> the monolithic design of the memory management subsystem makes it difficult
> to extend, and this difficulty grows as the complexity of the subsystem
> increases. Others in the research community have identified this problem as
> well [2,3]. We believe the kernel would benefit from some sort of extension
> interface to more easily prototype and implement memory management
> behaviors for a world with more diverse memory hierarchies.
>
> Filesystems are a natural extension mechanism for memory management because
> it already exists and memory mapping files into processes works. Also,
> precedent exists for writing memory managers as filesystems in the kernel
> with HugeTLBFS.
>
> While FBMM is easiest used for research and prototyping, I have also
> received feedback from people who work in industry that it would be useful
> for them as well. One person I talked to mentioned that they have made
> several changes to the memory management system in their branch that are
> not upstreamed, and it would be convinient to modularize those changes to
> avoid the headaches of rebasing when upgrading the kernel version.
>
> To use FBMM, one would perform the following steps:
> 1) Mount the MFS(s) they want to use
> 2) Enable FBMM by writting 1 to /sys/kernel/mm/fbmm/state
> 3) Set the MFS an application should allocate its memory from by writting
> the desired MFS's mount directory to /proc/<pid>/fbmm_mnt_dir, where <pid>
> is the PID of the target process.
>
> To have a process use an MFS for the entirety of the execution, one could
> use a wrapper program that writes /proc/self/fbmm_mount_dir then calls exec
> for the target process. We have created such a wrapper, which can be found
> at [4]. ld could also be extended to do this, using an environment variable
> similar to LD_PRELOAD.
>
> The first patch in this series adds the core of FBMM, allowing a user to
> set the MFS an application should allocate its anonymous memory from,
> transparently to the application.
>
> The second patch adds helper functions for common MM functionality that may
> be useful to MFS implementors for supporting swapping and handling
> fork/copy on write. Because fork is complicated, this patch adds a callback
> function to the super_operations struct to allow an MFS to decide its fork
> behavior, e.g. allow it to decide to do a deep copy of memory on fork
> instead of copy on write, and adds logic to the dup_mmap function to handle
> FBMM files.
>
> The third patch exports some kernel functions that are needed to implement
> an MFS to allow for MFSs to be written as kernel modules.
>
> The fourth and final patch in this series provides a sample implementation
> of a simple MFS, and is not actually intended to be upstreamed.
>
> [1] https://www.usenix.org/conference/atc24/presentation/tabatabai
> [2] https://www.usenix.org/conference/atc24/presentation/jalalian
> [3] https://www.usenix.org/conference/atc24/presentation/cao
> [4] https://github.com/multifacet/fbmm-workspace/blob/main/bmks/fbmm_wrapper.c
>
> Bijan Tabatabai (4):
>   mm: Add support for File Based Memory Management
>   fbmm: Add helper functions for FBMM MM Filesystems
>   mm: Export functions for writing MM Filesystems
>   Add base implementation of an MFS
>
>  BasicMFS/Kconfig                |   3 +
>  BasicMFS/Makefile               |   8 +
>  BasicMFS/basic.c                | 717 ++++++++++++++++++++++++++++++++
>  BasicMFS/basic.h                |  29 ++
>  arch/x86/include/asm/tlbflush.h |   2 -
>  arch/x86/mm/tlb.c               |   1 +
>  fs/Kconfig                      |   7 +
>  fs/Makefile                     |   1 +
>  fs/exec.c                       |   2 +
>  fs/file_based_mm.c              | 663 +++++++++++++++++++++++++++++
>  fs/proc/base.c                  |   4 +
>  include/linux/file_based_mm.h   |  99 +++++
>  include/linux/fs.h              |   1 +
>  include/linux/mm.h              |  10 +
>  include/linux/sched.h           |   4 +
>  kernel/exit.c                   |   3 +
>  kernel/fork.c                   |  57 ++-
>  mm/Makefile                     |   1 +
>  mm/fbmm_helpers.c               | 372 +++++++++++++++++
>  mm/filemap.c                    |   2 +
>  mm/gup.c                        |   1 +
>  mm/internal.h                   |  13 +
>  mm/memory.c                     |   3 +
>  mm/mmap.c                       |  44 +-
>  mm/pgtable-generic.c            |   1 +
>  mm/rmap.c                       |   2 +
>  mm/vmscan.c                     |  14 +-
>  27 files changed, 2040 insertions(+), 24 deletions(-)
>  create mode 100644 BasicMFS/Kconfig
>  create mode 100644 BasicMFS/Makefile
>  create mode 100644 BasicMFS/basic.c
>  create mode 100644 BasicMFS/basic.h
>  create mode 100644 fs/file_based_mm.c
>  create mode 100644 include/linux/file_based_mm.h
>  create mode 100644 mm/fbmm_helpers.c
>
> --
> 2.34.1
>
Bijan Tabatabai Nov. 24, 2024, 4:53 p.m. UTC | #2
On Sat, Nov 23, 2024 at 6:23 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> + VMA guys, it's important to run scripts/get_maintainers.pl on your
> changes so the right people are pinged :)

Sorry about that. I'll be more mindful of this next time I send a patch.

> While it's a cool project, I don't think it's upstreamable in its current
> form - it essentially bypasses core mm functionality and 'does mm'
> somewhere else (which strikes me, in effect, as the entire purpose of the
> series).

Understandable.
Thank you for spending the time to review the patches and giving a
thorough reply!

Bijan