Message ID | 20210630100432.v1.1.I09866d90c6de14f21223a03e9e6a31f8a02ecbaf@changeid (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v1] mm: Enable suspend-only swap spaces | expand |
Hi! > Currently it's not possible to enable hibernation without also enabling > generic swap for a given swap area. These two use cases are not the > same. For example there may be users who want to enable hibernation, > but whose drives don't have the write endurance for generic swap > activities. > > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow > generic swapping to it. This region can still be wired up for use in > suspend-to-disk activities, but will never have regular pages swapped to > it. > > Signed-off-by: Evan Green <evgreen@chromium.org> Makes sense to me. Reviewed-by: Pavel Machek <pavel@ucw.cz> > #define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */ > #define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */ > +#define SWAP_FLAG_NOSWAP 0x80000 /* use only for suspend, not swap */ I'd say "only for hibernation". And actually maybe code would be more clear if logic was reverted. Aha, and you may want to check... does the hibernation still work for you without the swap? Because we need half memory free to create swap image and swap is really quite useful for that. Best regards, Pavel
On Thu, Jul 1, 2021 at 1:02 PM Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > Currently it's not possible to enable hibernation without also enabling > > generic swap for a given swap area. These two use cases are not the > > same. For example there may be users who want to enable hibernation, > > but whose drives don't have the write endurance for generic swap > > activities. > > > > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow > > generic swapping to it. This region can still be wired up for use in > > suspend-to-disk activities, but will never have regular pages swapped to > > it. > > > > Signed-off-by: Evan Green <evgreen@chromium.org> > > Makes sense to me. > > Reviewed-by: Pavel Machek <pavel@ucw.cz> Thanks! > > > #define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */ > > #define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */ > > +#define SWAP_FLAG_NOSWAP 0x80000 /* use only for suspend, not swap */ > > I'd say "only for hibernation". And actually maybe code would be more clear if logic was reverted. Sure about the rewording. Yes, I also thought about flipping the polarity. This made more sense to me as an outlier condition, despite the slight awkwardness of a negative flag. And the usermode flag has to be written this way, so I might as well carry it through. I think I'll keep it unless anyone feels strongly. > > Aha, and you may want to check... does the hibernation still work for you without the swap? > > Because we need half memory free to create swap image and swap is really quite useful for that. Yes, hibernation still works. You're right that without another swap space set up, it starts to fail when half of memory is used up. This flag gives me the control to exclusively steer swap towards one device and hibernate towards another. -Evan
On 30.06.21 19:07, Evan Green wrote: > Currently it's not possible to enable hibernation without also enabling > generic swap for a given swap area. These two use cases are not the > same. For example there may be users who want to enable hibernation, > but whose drives don't have the write endurance for generic swap > activities. > > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow > generic swapping to it. This region can still be wired up for use in > suspend-to-disk activities, but will never have regular pages swapped to > it. Just to confirm: things like /proc/meminfo won't show this "swap that's not actually swap" as free/total swap, correct? Maybe it's worth spelling the expected system behavior out here.
On Mon, Jul 5, 2021 at 12:44 AM David Hildenbrand <david@redhat.com> wrote: > > On 30.06.21 19:07, Evan Green wrote: > > Currently it's not possible to enable hibernation without also enabling > > generic swap for a given swap area. These two use cases are not the > > same. For example there may be users who want to enable hibernation, > > but whose drives don't have the write endurance for generic swap > > activities. > > > > Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow > > generic swapping to it. This region can still be wired up for use in > > suspend-to-disk activities, but will never have regular pages swapped to > > it. > > Just to confirm: things like /proc/meminfo won't show this "swap that's > not actually swap" as free/total swap, correct? Maybe it's worth > spelling the expected system behavior out here. Currently these noswap regions do still count in /proc/meminfo. I suppose as you say it makes more sense for it not to count. I should be able to carefully put some conditionals around the nr_swap_pages and total_swap_pages accounting to fix that. I'll also document this in the commit text as suggested. When looking at that, I realized something else. Hibernate uses swap_free() to release its image pages, which calls free_swap_slot(). That may land the page in a swap_slots_cache, causing it to possibly leak back into general usage. I'm thinking I should just call swap_entry_free() directly if NOSWAP is set. I gave that a quick test and so far it looks good. Other random musings I had while staring that this code: It surprised me that there's swap_entry_free() and __swap_entry_free(), but the one without underscores is the lower level one (ie __swap_entry_free() winds through the cache, swap_entry_free() just does it). I'm not really sure if renaming those is worth the churn or not: leaning towards no. It's also interesting that scan_swap_map_slots() chooses whether or not to attempt reclaim based on vm_swap_full(). vm_swap_full() returns true if swap globally is 50% full or not. But hibernate is restricted to a single swap device. So you could find yourself in a situation where the hibernate device was full-but-reclaimable, and other areas aren't very full. This might cause hibernations to fail because we never attempted to reclaim swap. Maybe this never comes up in practice because people don't use multiple swap devices. Or maybe we naturally tend to spread the swap load evenly such that looking at the global counts is roughly equivalent to looking at a single one. Shower thoughts. I'll keep this in mind when I'm doing my own testing to see if it ever comes up. Thanks for the review, I'll plan to post a v2 in the next couple days. -Evan
diff --git a/include/linux/swap.h b/include/linux/swap.h index 6f5a43251593c8..a9fc37e29c17d6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -28,10 +28,11 @@ struct pagevec; #define SWAP_FLAG_DISCARD 0x10000 /* enable discard for swap */ #define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */ #define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */ +#define SWAP_FLAG_NOSWAP 0x80000 /* use only for suspend, not swap */ #define SWAP_FLAGS_VALID (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \ SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \ - SWAP_FLAG_DISCARD_PAGES) + SWAP_FLAG_DISCARD_PAGES | SWAP_FLAG_NOSWAP) #define SWAP_BATCH 64 static inline int current_is_kswapd(void) @@ -182,6 +183,7 @@ enum { SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ + SWP_NOSWAP = (1 << 13), /* use only for suspend, not swap */ /* add others here before... */ SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; diff --git a/mm/swapfile.c b/mm/swapfile.c index 1e07d1c776f2ae..164937f958c319 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -697,7 +697,8 @@ static void swap_range_alloc(struct swap_info_struct *si, unsigned long offset, if (si->inuse_pages == si->pages) { si->lowest_bit = si->max; si->highest_bit = 0; - del_from_avail_list(si); + if (!(si->flags & SWP_NOSWAP)) + del_from_avail_list(si); } } @@ -726,7 +727,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, bool was_full = !si->highest_bit; WRITE_ONCE(si->highest_bit, end); - if (was_full && (si->flags & SWP_WRITEOK)) + if (was_full && + ((si->flags & (SWP_WRITEOK | SWP_NOSWAP)) == SWP_WRITEOK)) add_to_avail_list(si); } atomic_long_add(nr_entries, &nr_swap_pages); @@ -1078,6 +1080,9 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) WARN(!(si->flags & SWP_WRITEOK), "swap_info %d in list but !SWP_WRITEOK\n", si->type); + WARN((si->flags & SWP_NOSWAP), + "swap_info %d in list but SWP_NOSWAP\n", + si->type); __del_from_avail_list(si); spin_unlock(&si->lock); goto nextsi; @@ -2469,7 +2474,8 @@ static void _enable_swap_info(struct swap_info_struct *p) * swap_info_struct. */ plist_add(&p->list, &swap_active_head); - add_to_avail_list(p); + if (!(p->flags & SWP_NOSWAP)) + add_to_avail_list(p); } static void enable_swap_info(struct swap_info_struct *p, int prio, @@ -2564,7 +2570,9 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile) spin_unlock(&swap_lock); goto out_dput; } - del_from_avail_list(p); + if (!(p->flags & SWP_NOSWAP)) + del_from_avail_list(p); + spin_lock(&p->lock); if (p->prio < 0) { struct swap_info_struct *si = p; @@ -3329,16 +3337,20 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (swap_flags & SWAP_FLAG_PREFER) prio = (swap_flags & SWAP_FLAG_PRIO_MASK) >> SWAP_FLAG_PRIO_SHIFT; + + if (swap_flags & SWAP_FLAG_NOSWAP) + p->flags |= SWP_NOSWAP; enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map); - pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s\n", + pr_info("Adding %uk swap on %s. Priority:%d extents:%d across:%lluk %s%s%s%s%s%s\n", p->pages<<(PAGE_SHIFT-10), name->name, p->prio, nr_extents, (unsigned long long)span<<(PAGE_SHIFT-10), (p->flags & SWP_SOLIDSTATE) ? "SS" : "", (p->flags & SWP_DISCARDABLE) ? "D" : "", (p->flags & SWP_AREA_DISCARD) ? "s" : "", (p->flags & SWP_PAGE_DISCARD) ? "c" : "", - (frontswap_map) ? "FS" : ""); + (frontswap_map) ? "FS" : "", + (p->flags & SWP_NOSWAP) ? "N" : ""); mutex_unlock(&swapon_mutex); atomic_inc(&proc_poll_event);
Currently it's not possible to enable hibernation without also enabling generic swap for a given swap area. These two use cases are not the same. For example there may be users who want to enable hibernation, but whose drives don't have the write endurance for generic swap activities. Add a new SWAP_FLAG_NOSWAP that adds a swap region but refuses to allow generic swapping to it. This region can still be wired up for use in suspend-to-disk activities, but will never have regular pages swapped to it. Signed-off-by: Evan Green <evgreen@chromium.org> --- include/linux/swap.h | 4 +++- mm/swapfile.c | 24 ++++++++++++++++++------ 2 files changed, 21 insertions(+), 7 deletions(-)