From patchwork Thu Aug 6 18:49:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naoya Horiguchi X-Patchwork-Id: 11704083 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADEAD1510 for ; Thu, 6 Aug 2020 18:49:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DDD3122CAF for ; Thu, 6 Aug 2020 18:49:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HrOzapY5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DDD3122CAF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7E2296B0002; Thu, 6 Aug 2020 14:49:31 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 792CD6B0003; Thu, 6 Aug 2020 14:49:31 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A8EA6B0005; Thu, 6 Aug 2020 14:49:31 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 589A26B0002 for ; Thu, 6 Aug 2020 14:49:31 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E01443626 for ; Thu, 6 Aug 2020 18:49:30 +0000 (UTC) X-FDA: 77121032100.13.knot19_4c0c58726fb9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id B70B818140B69 for ; Thu, 6 Aug 2020 18:49:30 +0000 (UTC) X-Spam-Summary: 1,0,0,bb50886e4a7d5744,d41d8cd98f00b204,nao.horiguchi@gmail.com,,RULES_HIT:41:69:355:379:541:800:966:967:968:973:988:989:1260:1345:1437:1535:1543:1711:1730:1747:1777:1792:2196:2199:2393:2525:2559:2567:2570:2682:2685:2703:2859:2899:2909:2910:2933:2937:2939:2942:2945:2947:2951:2954:3022:3354:3865:3866:3867:3868:3870:3871:3872:3873:3874:3934:3936:3938:3941:3944:3947:3950:3953:3956:3959:4117:4250:4321:4362:4385:4605:5007:6119:6261:7514:7903:8660:8957:9010:9025:9391:9413:10004:11658:13148:13205:13229:13230,0,RBL:209.85.215.194:@gmail.com:.lbl8.mailshell.net-62.18.0.100 66.100.201.100;04yfop6axkx9n7cqrbxx4bs9mfshsypkjc34c83dsnuqqq84k4ummos8hzusewm.y9arkj8jmm1xrq5odc466db4pibm6gapz5y57jfirx7um4rbqk7jktea8uxqwpz.g-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:neutral,Custom_rules:0:0:0,LFtime:23,LUA_SUMMARY:none X-HE-Tag: knot19_4c0c58726fb9 X-Filterd-Recvd-Size: 6230 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Thu, 6 Aug 2020 18:49:30 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id d19so3868415pgl.10 for ; Thu, 06 Aug 2020 11:49:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=RzCR8eOyGZpHLydYs0qnEBF3ql3Tt87KmFjZ6QFWP+0=; b=HrOzapY58HGT/QHFZ2Ar3eHGxaNQBne46zo/lYKat8FzI+hw3SWo56er1DkIwL+30D 1fav0XJZzhxLEOCZbOJxYSt+b6Gv5d7+U/zsZ6BPh01SV9Lfa9U9+Y/hM3OTKQ2KF05C Qr0/O7nzG7mD7ireeuYHehgI4o6pQ/K9ZSt+ESoVmDDelsk84B1gVs0luwnkthHlWZoK K1/tLWNmiuN1QjpA0XOuFHIgTWXw9etIwXQ6N4oM/ohR4cIojBhZrbopqbOTQMiUkDhs nKRtwIDZf5MSzRcrDojl/fWt8rEYOTcCgBZEOFBgtrPaeO7Tx7Mji6wCKtQIGpuAWvSL ndNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=RzCR8eOyGZpHLydYs0qnEBF3ql3Tt87KmFjZ6QFWP+0=; b=PdEvguLtzHBkMm/gnIxPQEMmKMmeFuH7xMyEhHpi4HS+iIG4LfmZVLNazkSQaN069g sPCkuSM899wmeODg12UXFM1a0mMjRJNRqDhj61ebZeX5aMHSTCNFK34ARY3gtNla/U/b aTseqBIMdUph1AYWf8nByYOMgBEnCXhHiOO+biYrpadIBaH/5mtL09g8aZLn5TtH5A7g U/XiAEO3n+9jGRSKw+epFeFNbKbWCdb35AAFFkdPQvdP7R0DwjoObYUci3p/BCYGL3sJ r0OLyM/YSTqy3iCrx2ch0nAHK19tQ5nj/0E+HPdrH93MJJlGE2QvcYsB1zxeL6zUrvE1 ebVQ== X-Gm-Message-State: AOAM533HuMmnW4lbxBKHZBlo95d4zpMoFM67kFa/NF2UD20P5TABQHyK Fy3tI+TUtqccgCYjwiAAUT9raI8tOA== X-Google-Smtp-Source: ABdhPJwwWS0/MG9ulSnZhDQM2qzYL9cXvhe9nZ4acBdeqU8xXU5hMDmpWXhEhM2l/9LO9LWZ5oAnDA== X-Received: by 2002:a62:82c1:: with SMTP id w184mr9809084pfd.202.1596739768843; Thu, 06 Aug 2020 11:49:28 -0700 (PDT) Received: from ip-172-31-41-194.ap-northeast-1.compute.internal (ec2-52-199-21-241.ap-northeast-1.compute.amazonaws.com. [52.199.21.241]) by smtp.gmail.com with ESMTPSA id u24sm9096730pfm.20.2020.08.06.11.49.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Aug 2020 11:49:28 -0700 (PDT) From: nao.horiguchi@gmail.com To: linux-mm@kvack.org Cc: mhocko@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, osalvador@suse.de, tony.luck@intel.com, david@redhat.com, aneesh.kumar@linux.vnet.ibm.com, zeil@yandex-team.ru, cai@lca.pw, naoya.horiguchi@nec.com, linux-kernel@vger.kernel.org Subject: [PATCH v6 00/12] HWPOISON: soft offline rework Date: Thu, 6 Aug 2020 18:49:11 +0000 Message-Id: <20200806184923.7007-1-nao.horiguchi@gmail.com> X-Mailer: git-send-email 2.17.1 X-Rspamd-Queue-Id: B70B818140B69 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, This patchset is the latest version of soft offline rework patchset targetted for v5.9. Since v5, I dropped some patches which tweak refcount handling in madvise_inject_error() to avoid the "unknown refcount page" error. I don't confirm the fix (that didn't reproduce with v5 in my environment), but this change surely call soft_offline_page() after holding refcount, so the error should not happen any more. Dropped patches - mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED - mm,madvise: Refactor madvise_inject_error - mm,hwpoison: remove MF_COUNT_INCREASED - mm,hwpoison: remove flag argument from soft offline functions Thanks, Naoya Horiguchi Quoting cover letter of v5: ---- Main focus of this series is to stabilize soft offline. Historically soft offlined pages have suffered from racy conditions because PageHWPoison is used to a little too aggressively, which (directly or indirectly) invades other mm code which cares little about hwpoison. This results in unexpected behavior or kernel panic, which is very far from soft offline's "do not disturb userspace or other kernel component" policy. Main point of this change set is to contain target page "via buddy allocator", where we first free the target page as we do for normal pages, and remove from buddy only when we confirm that it reaches free list. There is surely race window of page allocation, but that's fine because someone really want that page and the page is still working, so soft offline can happily give up. v4 from Oscar tries to handle the race around reallocation, but that part seems still work in progress, so I decide to separate it for changes into v5.9. Thank you for your contribution, Oscar. Reviewed-by: Oscar Salvador --- Previous versions: v1: https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horiguchi@ah.jp.nec.com/ v2: https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalvador@suse.de/ v3: https://lore.kernel.org/linux-mm/20200624150137.7052-1-nao.horiguchi@gmail.com/ v4: https://lore.kernel.org/linux-mm/20200716123810.25292-1-osalvador@suse.de/ v5: https://lore.kernel.org/linux-mm/20200805204354.GA16406@hori.linux.bs1.fc.nec.co.jp/T/#t --- Summary: Naoya Horiguchi (5): mm,hwpoison: cleanup unused PageHuge() check mm, hwpoison: remove recalculating hpage mm,hwpoison-inject: don't pin for hwpoison_filter mm,hwpoison: introduce MF_MSG_UNSPLIT_THP mm,hwpoison: double-check page count in __get_any_page() Oscar Salvador (7): mm,hwpoison: Un-export get_hwpoison_page and make it static mm,hwpoison: Kill put_hwpoison_page mm,hwpoison: Unify THP handling for hard and soft offline mm,hwpoison: Rework soft offline for free pages mm,hwpoison: Rework soft offline for in-use pages mm,hwpoison: Refactor soft_offline_huge_page and __soft_offline_page mm,hwpoison: Return 0 if the page is already poisoned in soft-offline include/linux/mm.h | 3 +- include/linux/page-flags.h | 6 +- include/ras/ras_event.h | 3 + mm/hwpoison-inject.c | 18 +-- mm/madvise.c | 5 - mm/memory-failure.c | 307 +++++++++++++++++++++------------------------ mm/migrate.c | 11 +- mm/page_alloc.c | 60 +++++++-- 8 files changed, 203 insertions(+), 210 deletions(-)