From patchwork Thu Oct 21 11:53:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 12574813 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53D4CC433F5 for ; Thu, 21 Oct 2021 11:53:31 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1AD68606A5 for ; Thu, 21 Oct 2021 11:53:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1AD68606A5 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.214360.372872 (Exim 4.92) (envelope-from ) id 1mdWdC-0008F4-Aq; Thu, 21 Oct 2021 11:53:10 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 214360.372872; Thu, 21 Oct 2021 11:53:10 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mdWdC-0008Ex-81; Thu, 21 Oct 2021 11:53:10 +0000 Received: by outflank-mailman (input) for mailman id 214360; Thu, 21 Oct 2021 11:53:09 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mdWdB-0008Er-FE for xen-devel@lists.xenproject.org; Thu, 21 Oct 2021 11:53:09 +0000 Received: from smtp-out1.suse.de (unknown [195.135.220.28]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 75017a48-3265-11ec-8376-12813bfff9fa; Thu, 21 Oct 2021 11:53:08 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7D782218B1; Thu, 21 Oct 2021 11:53:07 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 660F5133A6; Thu, 21 Oct 2021 11:53:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id vvawF6NUcWEkcgAAMHmgww (envelope-from ); Thu, 21 Oct 2021 11:53:07 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 75017a48-3265-11ec-8376-12813bfff9fa DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1634817187; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=Tp6cFf9CV+0py7Wah9oxllfjuQvZL1C6jr/1fO4cO1k=; b=YFVPU9lYQr9doAPHmPikANNacIIazkgfoOvp/4cY+L3G2nJrswcBJAA3Dg/F8PFohfJxyY 5DfdVP6uo3AUwTSmEXadkv/ORuKvvJspmlfgjMj41Djghp9lizlWHRqkMXsUbRvtqcinAO 0l5eQPKVSSBKu7ybwLIWmIGhoB4F1Fo= To: =?utf-8?q?Marek_Marczykowski-G=C3=B3recki?= Cc: "xen-devel@lists.xenproject.org" From: Juergen Gross Subject: Tentative fix for "out of PoD memory" issue Message-ID: <912c7377-26f0-c14a-e3aa-f00a81ed5766@suse.com> Date: Thu, 21 Oct 2021 13:53:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 Marek, could you please test whether the attached patch is fixing your problem? BTW, I don't think this couldn't happen before kernel 5.15. I guess my modification to use a kernel thread instead of a workqueue just made the issue more probable. I couldn't reproduce the crash you are seeing, but the introduced wait was 4.2 seconds on my test system (a PVH guest with 2 GB of memory, maxmem 6 GB). Juergen From 3ee35f6f110e2258ec94f0d1397fac8c26b41761 Mon Sep 17 00:00:00 2001 From: Juergen Gross To: linux-kernel@vger.kernel.org Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org Date: Thu, 21 Oct 2021 12:51:06 +0200 Subject: [PATCH] xen/balloon: add late_initcall_sync() for initial ballooning done MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When running as PVH or HVM guest with actual memory < max memory the hypervisor is using "populate on demand" in order to allow the guest to balloon down from its maximum memory size. For this to work correctly the guest must not touch more memory pages than its target memory size as otherwise the PoD cache will be exhausted and the guest is crashed as a result of that. In extreme cases ballooning down might not be finished today before the init process is started, which can consume lots of memory. In order to avoid random boot crashes in such cases, add a late init call to wait for ballooning down having finished for PVH/HVM guests. Reported-by: Marek Marczykowski-Górecki Signed-off-by: Juergen Gross --- drivers/xen/balloon.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 3a50f097ed3e..d19b851c3d3b 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -765,3 +765,23 @@ static int __init balloon_init(void) return 0; } subsys_initcall(balloon_init); + +static int __init balloon_wait_finish(void) +{ + if (!xen_domain()) + return -ENODEV; + + /* PV guests don't need to wait. */ + if (xen_pv_domain() || !current_credit()) + return 0; + + pr_info("Waiting for initial ballooning down having finished.\n"); + + while (current_credit()) + schedule_timeout_interruptible(HZ / 10); + + pr_info("Initial ballooning down finished.\n"); + + return 0; +} +late_initcall_sync(balloon_wait_finish); -- 2.26.2