[v2,0/7] hugetlbfs memory HW error fixes

Message ID	20241107102126.2183152-1-william.roche@oracle.com (mailing list archive)
Headers	show Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6394B1DFE30 for <kvm@vger.kernel.org>; Thu, 7 Nov 2024 10:21:49 +0000 (UTC) From: =?utf-8?q?=E2=80=9CWilliam_Roche?= <william.roche@oracle.com> To: david@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org Cc: william.roche@oracle.com, peterx@redhat.com, pbonzini@redhat.com, richard.henderson@linaro.org, philmd@linaro.org, peter.maydell@linaro.org, mtosatti@redhat.com, imammedo@redhat.com, eduardo@habkost.net, marcel.apfelbaum@gmail.com, wangyanan55@huawei.com, zhao1.liu@intel.com, joao.m.martins@oracle.com Subject: [PATCH v2 0/7] hugetlbfs memory HW error fixes Date: Thu, 7 Nov 2024 10:21:19 +0000 Message-ID: <20241107102126.2183152-1-william.roche@oracle.com> In-Reply-To: <e2ac7ad0-aa26-4af2-8bb3-825cba4ffca0@redhat.com> References: <e2ac7ad0-aa26-4af2-8bb3-825cba4ffca0@redhat.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk MIME-Version: 1.0
Series	hugetlbfs memory HW error fixes \| expand [v2,0/7] hugetlbfs memory HW error fixes [v2,1/7] accel/kvm: Keep track of the HWPoisonPage page_size [v2,2/7] system/physmem: poisoned memory discard on reboot [v2,3/7] accel/kvm: Report the loss of a large memory page [v2,4/7] numa: Introduce and use ram_block_notify_remap() [v2,5/7] hostmem: Factor out applying settings [v2,6/7] hostmem: Handle remapping of RAM [v2,7/7] system/physmem: Memory settings applied on remap notification

Message ID

20241107102126.2183152-1-william.roche@oracle.com (mailing list archive)

Headers

From: =?utf-8?q?=E2=80=9CWilliam_Roche?= <william.roche@oracle.com>
To: david@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org,
        qemu-arm@nongnu.org
Cc: william.roche@oracle.com, peterx@redhat.com, pbonzini@redhat.com,
        richard.henderson@linaro.org, philmd@linaro.org,
        peter.maydell@linaro.org, mtosatti@redhat.com, imammedo@redhat.com,
        eduardo@habkost.net, marcel.apfelbaum@gmail.com,
        wangyanan55@huawei.com, zhao1.liu@intel.com, joao.m.martins@oracle.com
Subject: [PATCH v2 0/7] hugetlbfs memory HW error fixes
Date: Thu,  7 Nov 2024 10:21:19 +0000
Message-ID: <20241107102126.2183152-1-william.roche@oracle.com>
In-Reply-To: <e2ac7ad0-aa26-4af2-8bb3-825cba4ffca0@redhat.com>
References: <e2ac7ad0-aa26-4af2-8bb3-825cba4ffca0@redhat.com>
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Precedence: bulk
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 xbHPfJBEB7QCeU0GmzRRcR99GF8SLxcy8GbFIpc/PvD3XM9uCHUvRQUZyeJD5n6/VZPypNfssoev+mQRDGcq43bazABFlaXVjWR0L4vGlm+wOo7p4TdO8Dv9PVozhwtdGBXdUapJtEYU9OCfGX1OFlKw8I57JHwNAcgjFK8mNngurvFgxusCYux5MX4ldObdBPzzT6VOzg/tIuNXPdlZuEd+v+NphNrTsqA/hkgxjAWi3oDFZiFXE/lB6rRMiGAqjz1NITRR+EjP8nlEU/jAsS68IrV7Msl/dmelXstZFER9LcXjcyJxTbzXmPtA/apQ3ZY9cpymP3etI+y2H01Y9QrQBR8wUVHHLRTQMjRfI3j4gH5khJNCnvAokdgaS/+be0QS9gVEsDyFw5oycMdTCuAYrU0K1ngZBv1F/X5f0pu71I4vfPf+GsRG/f8/bcNcfoT5nXT++40bh/Rxp7nbYsWnu/zqLYEuuhy1moc3GpjvOhppEv9BKZ56Kg2gMO/hh5K4DIErKHhcvs+zsmOYFnrSRh/JruyYO2QKYDK/VEihThYj4f7K3zok7NEBVn25iiJTIQysrsMXJ4NrDmua7N/qgAV3M2h6Eo872Efs7FWn52D1g9A/OKNPQK1YbJrkQe0bzxCRNLccOXZCTGeDqR6RBRetzch2I/js2/I/c+VNuBEJulKluVD/VA5xDVJwnXYSGchoTNib6dQ5jkd01ht1F2WXHwsW0lMAfWfTdaUUbYLdhoO/h7HarELpYffJb7Lpk7wyw3zQAoQy/V4r+LqkcvomvPFUmXwX5GcSkQV0jHUM5J7ReccVAPFoxswp7ZQSp88cxUd/vB1BbhoHsf1KdjCwd//vm0YU3UeuDyvMZxCS1qc9nIv/b0ehoykFRO1cPeRz31yxY7KwrnaJKalHftQdR6znV1EpudSHNuyuznnNoOqkbFCa5WU1YC4uLPdMUFKOZofQwZFviwDCdxV1PNaaWrlvoLpBTuxIhLjfjxY8Rc2ovar9hd6IRauuAhogOYZwP+jquYXImSJr+96nEDKRk74lX5h4gfPzu3sFfYc6kQZir/PfaDP1OyVVYsYOE+4b3hJCXnGC1vdbMSH8bQKt9m9B3STYFRW6jcVyI+CQ0UrqjsEgeTxZACLIHNOzjXP7J0149fvGbCHB7wjg6OUn2LGrzq9r5XaObqKLk1JYFid0PZ/6jwEeCMvLAzJdctnBw3Dp0vqBorHTufMyK211ltVXTLaNtlzy8o8f0Il584EmsDSZF4SlPkvRNPXnIcRrinDOFhV3RsCpNRASoa3Nt5ydY5u6C4By4sHhusVW6K08Z7DJO/Wmq3Ns/i807AKKp3hVRQCAxMZZDpxrxTHg1MsiFEXaX8XwNqIRtFmXkBI6SdBb3p6ekampLuhjjrMXwIQvvsONENWqfs9WSpK41EGSW/iZ+QvFselnAtGJsrbEA3ucsIUqOV61RX6a4wNI7XJQzBnxRQuH+OdFZuG3o0Wbl/0WFPm0sLa3zSnrwAo/EdcaKu4xqokpgwYSVU0tP7MdOdLP3K8I6awcRR3aCHzcAEqvASfAI5IKPPWTbmG85xVB4OoR2SaBhT6rSiZZa6c0MZ/kvitfLg==
X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: 
	UKJhgQtDgVpyLbwQmSleFiK0wXkZH3xkj8YoKSyO+DpCsDwDjEN3Au3+pPrl8V6y63rEgOqyuVn30z/WDFIZJFBE2A4+diR3G4Tpl4QvuPjlWqnM33sF7t0n2ca1IJ+nSS7vm2T21JPshWExXLT5XrAQo6Nj1CAswhkQcB7kJqeWzao7U/wyic8V4wbOnNnpAIMH/ft1fLgZXJa8B7LdPuTfSXhBaygP6A/d1TtRW0OC6T35hMDHG8mDrz4vlEn58vfFvrtGIX08LZSIe28DkLcAkf2BkUY0iH8sAe3qIuYQcOcV5q7bmORGqOJneVH186EVXxuxoH7zib3wG19R4CFftPnpppXkWNawAeb7goeoP0/F5C8W1KT91jff+xhsoyjLJdp7DmgJtINXzoaPoahj9qfpWHGAfCUXf2QL4fjQJVxs7m7OfnC/2iErTQxfKRAgWLqIFti1Jg0/phSiYrvsR0l2AYfpRCRgNdK9ogPNp7mRb2jf/GLORVi1/kzX3kqkiVTaoEdauA60+Qc9VYZ/yS1ObaVm/SvEn016Cj1is3afkaKXpot0iMykv0ZvWJrCQNVzyy96XLlb2+kCoPNvdxWV8BaMMvX0q2rRyR8=
X-OriginatorOrg: oracle.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 36c89716-7800-4216-5827-08dcff15f12a
X-MS-Exchange-CrossTenant-AuthSource: CH3PR10MB7329.namprd10.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Nov 2024 10:21:28.7325
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 XiyMbTUqi1jnNySEcU+P36RWSoAF6OVSSHJ6MuaDw/VuUbCRc4/XmtocHZ4wkEs89zxuv/bnsTUByGdjkHVcc1qnLwkraKzLnH4A0iIjlLI=
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR10MB4187
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.62.30
 definitions=2024-11-07_01,2024-11-06_01,2024-09-30_01
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0
 mlxlogscore=999 spamscore=0
 phishscore=0 malwarescore=0 adultscore=0 suspectscore=0 mlxscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2409260000
 definitions=main-2411070079
X-Proofpoint-GUID: gan0oNlY9i0F7mh2rvNSWN2XwSCEpGwp
X-Proofpoint-ORIG-GUID: gan0oNlY9i0F7mh2rvNSWN2XwSCEpGwp

Series

hugetlbfs memory HW error fixes | expand

Message

“William Roche Nov. 7, 2024, 10:21 a.m. UTC

From: William Roche <william.roche@oracle.com>

Hi David,

Here is an updated description of the patch set:
 ---
This set of patches fixes several problems with hardware memory errors
impacting hugetlbfs memory backed VMs. When using hugetlbfs large
pages, any large page location being impacted by an HW memory error
results in poisoning the entire page, suddenly making a large chunk of
the VM memory unusable.

The main problem that currently exists in Qemu is the lack of backend
file repair before resetting the VM memory, resulting in the impacted
memory to be silently unusable even after a VM reboot.

In order to fix this issue, we track the page size of the impacted
memory block with the associated poisoned page location.

Using the size information we also call ram_block_discard_range() to
regenerate the memory on VM reset when running qemu_ram_remap(). So
that a poisoned memory backed by a hugetlbfs file is regenerated with
a hole punched in this file. A new page is loaded when the location
is first touched.

In case of a discard failure we fall back to unmap/remap the memory
location and reset the memory settings.

We also have to honor the 'prealloc' attribute even after a successful
discard, so we reapply the memory settings in this case too.

This memory setting is performed by a new remap notification mechanism
calling host_memory_backend_ram_remapped() function when a region of
a memory block is remapped.

Issue also a message providing the impact information of a large page
memory loss. Only reported once when the page is poisoned.
 ---


v1 -> v2:
. I removed the kernel SIGBUS siginfo provided lsb size information
  tracking. Only relying on the RAMBlock page_size instead.

. I adapted the 3 patches you indicated me to implement the
  notification mechanism on remap.  Thank you for this code!
  I left them as Authored by you.
  But I haven't tested if the policy setting works as expected on VM
  reset, only that the replacement of physical memory works.

. I also removed the old memory setting that was kept in qemu_ram_remap()
  but this small last fix could probably be merged with your last commit.


I also got yesterday the recording of the mm-linux session about the
kernel modification on largepage poisoning, and discussed this topic
with a colleague of mine who attended the meeting.

About the use of -mem-path question you asked me, we communicated the
information about the deprecated aspect of this option and advise all
users to use the following options instead.
-object memory-backend-file,id=pc.ram,mem-path=/dev/hugepages,prealloc,size=XXX -machine memory-backend=pc.ram 

We could now add the request to use a share=on attribute too, to avoid
the additional message about dangerous discard situations.


This code is scripts/checkpatch.pl clean
'make check' runs fine on both x86 and Arm.


David Hildenbrand (3):
  numa: Introduce and use ram_block_notify_remap()
  hostmem: Factor out applying settings
  hostmem: Handle remapping of RAM

William Roche (4):
  accel/kvm: Keep track of the HWPoisonPage page_size
  system/physmem: poisoned memory discard on reboot
  accel/kvm: Report the loss of a large memory page
  system/physmem: Memory settings applied on remap notification

 accel/kvm/kvm-all.c       |  17 +++-
 backends/hostmem.c        | 184 +++++++++++++++++++++++---------------
 hw/core/numa.c            |  11 +++
 include/exec/cpu-common.h |   1 +
 include/exec/ramlist.h    |   3 +
 include/sysemu/hostmem.h  |   1 +
 include/sysemu/kvm_int.h  |   4 +-
 system/physmem.c          |  62 ++++++++-----
 target/arm/kvm.c          |   2 +-
 target/i386/kvm/kvm.c     |   2 +-
 10 files changed, 189 insertions(+), 98 deletions(-)