mbox series

[v7,00/14] HWPOISON: soft offline rework

Message ID 20200922135650.1634-1-osalvador@suse.de (mailing list archive)
Headers show
Series HWPOISON: soft offline rework | expand

Message

Oscar Salvador Sept. 22, 2020, 1:56 p.m. UTC
Hi,

This patchset is the latest version of soft offline rework patchset
targetted for v5.9.

This patchset fixes a couple of issues that the patchset Naoya
sent [1] contained due to rebasing problems and a misunterdansting.

Main focus of this series is to stabilize soft offline.  Historically soft
offlined pages have suffered from racy conditions because PageHWPoison is
used to a little too aggressively, which (directly or indirectly) invades
other mm code which cares little about hwpoison.  This results in unexpected
behavior or kernel panic, which is very far from soft offline's "do not
disturb userspace or other kernel component" policy.
An example of this can be found here [2].

Along with several cleanups, this code refactors and changes the way soft
offline work.
Main point of this change set is to contain target page "via buddy allocator"
or in migrating path.
For ther former we first free the target page as we do for normal pages, and
once it has reached buddy and it has been taken off the freelists, we flag it
as HWpoison.
For the latter we never get to release the page in unmap_and_move, so
the page is under our control and we can handle it in hwpoison code.

[1] https://patchwork.kernel.org/cover/11704083/
[2] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u


Naoya Horiguchi (5):
  mm,hwpoison: cleanup unused PageHuge() check
  mm, hwpoison: remove recalculating hpage
  mm,hwpoison-inject: don't pin for hwpoison_filter
  mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
  mm,hwpoison: double-check page count in __get_any_page()

Oscar Salvador (9):
  mm,hwpoison: unexport get_hwpoison_page and make it static
  mm,hwpoison: refactor madvise_inject_error
  mm,hwpoison: kill put_hwpoison_page
  mm,hwpoison: unify THP handling for hard and soft offline
  mm,hwpoison: rework soft offline for free pages
  mm,hwpoison: rework soft offline for in-use pages
  mm,hwpoison: refactor soft_offline_huge_page and __soft_offline_page
  mm,hwpoison: return 0 if the page is already poisoned in soft-offline
  mm,hwpoison: Try to narrow window race for free pages

 include/linux/mm.h         |   3 +-
 include/linux/page-flags.h |   6 +-
 include/ras/ras_event.h    |   3 +
 mm/hwpoison-inject.c       |  18 +--
 mm/madvise.c               |  35 ++---
 mm/memory-failure.c        | 311 +++++++++++++++++--------------------
 mm/migrate.c               |  11 +-
 mm/page_alloc.c            |  71 +++++++--
 8 files changed, 231 insertions(+), 227 deletions(-)

Comments

Andrew Morton Sept. 22, 2020, 5:03 p.m. UTC | #1
On Tue, 22 Sep 2020 15:56:36 +0200 Oscar Salvador <osalvador@suse.de> wrote:

> This patchset is the latest version of soft offline rework patchset
> targetted for v5.9.

Thanks.

Where do we now stand with the followon patches:

mmhwpoison-take-free-pages-off-the-buddy-freelists.patch
mmhwpoison-drain-pcplists-before-bailing-out-for-non-buddy-zero-refcount-page.patch
mmhwpoison-drop-unneeded-pcplist-draining.patch
mmhwpoison-drop-unneeded-pcplist-draining-fix.patch
mmhwpoison-remove-stale-code.patch

I don't have a record of these having been reviewed?
Oscar Salvador Sept. 22, 2020, 5:56 p.m. UTC | #2
On 2020-09-22 19:03, Andrew Morton wrote:
> On Tue, 22 Sep 2020 15:56:36 +0200 Oscar Salvador <osalvador@suse.de> 
> wrote:
> 
>> This patchset is the latest version of soft offline rework patchset
>> targetted for v5.9.
> 
> Thanks.
> 
> Where do we now stand with the followon patches:
> 
> mmhwpoison-take-free-pages-off-the-buddy-freelists.patch
> mmhwpoison-drain-pcplists-before-bailing-out-for-non-buddy-zero-refcount-page.patch
> mmhwpoison-drop-unneeded-pcplist-draining.patch
> mmhwpoison-drop-unneeded-pcplist-draining-fix.patch
> mmhwpoison-remove-stale-code.patch
> 
> I don't have a record of these having been reviewed?

Hi Andrew,

I would drop those for now as they depend on this work, and I would 
rather have this patchset settled first.

Once things are calm, I will resend the other ones and I will ask Naoya 
to review it.

Thanks!
Aristeu Rozanski Sept. 23, 2020, 1:29 p.m. UTC | #3
Hi Oscar,

On Tue, Sep 22, 2020 at 03:56:36PM +0200, Oscar Salvador wrote:
> This patchset is the latest version of soft offline rework patchset
> targetted for v5.9.
> 
> This patchset fixes a couple of issues that the patchset Naoya
> sent [1] contained due to rebasing problems and a misunterdansting.
> 
> Main focus of this series is to stabilize soft offline.  Historically soft
> offlined pages have suffered from racy conditions because PageHWPoison is
> used to a little too aggressively, which (directly or indirectly) invades
> other mm code which cares little about hwpoison.  This results in unexpected
> behavior or kernel panic, which is very far from soft offline's "do not
> disturb userspace or other kernel component" policy.
> An example of this can be found here [2].
> 
> Along with several cleanups, this code refactors and changes the way soft
> offline work.
> Main point of this change set is to contain target page "via buddy allocator"
> or in migrating path.
> For ther former we first free the target page as we do for normal pages, and
> once it has reached buddy and it has been taken off the freelists, we flag it
> as HWpoison.
> For the latter we never get to release the page in unmap_and_move, so
> the page is under our control and we can handle it in hwpoison code.
> 
> [1] https://patchwork.kernel.org/cover/11704083/
> [2] https://lore.kernel.org/linux-mm/20190826104144.GA7849@linux/T/#u

FWIW, tested again with these patches in the ppc64 box and they work.
I see that you added my Tested-by in the last patch but in any case:

Tested-by: Aristeu Rozanski <aris@ruivo.org>