diff mbox series

[v2] x86/xen: fix balloon target initialization for PVH dom0

Message ID 20250404133459.16125-1-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series [v2] x86/xen: fix balloon target initialization for PVH dom0 | expand

Commit Message

Roger Pau Monné April 4, 2025, 1:34 p.m. UTC
PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to
dom0 are re-used as scratch memory to map foreign and grant pages.  Such
logic relies on reporting those unpopulated ranges as RAM to Linux, and
mark them as reserved.  This way Linux creates the underlying page
structures required for metadata management.

Such approach works fine on PV because the initial balloon target is
calculated using specific Xen data, that doesn't take into account the
memory type changes described above.  However on HVM and PVH the initial
balloon target is calculated using get_num_physpages(), and that function
does take into account the unpopulated RAM regions used as scratch space
for remote domain mappings.

This leads to PVH dom0 having an incorrect initial balloon target, which
causes malfunction (excessive memory freeing) of the balloon driver if the
dom0 memory target is later adjusted from the toolstack.

Fix this by using xen_released_pages to account for any pages that are part
of the memory map, but are already unpopulated when the balloon driver is
initialized.  This accounts for any regions used for scratch remote
mappings.

Take the opportunity to unify PV with PVH/HVM guests regarding the usage of
get_num_physpages(), as that avoids having to add different logic for PV vs
PVH in both balloon_add_regions() and arch_xen_unpopulated_init().

Much like a6aa4eb994ee, the code in this changeset should have been part of
38620fc4e893.

Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Replace BUG_ON() with a WARN and failure to initialize the balloon
   driver.
---
 arch/x86/xen/enlighten.c |  7 +++++++
 drivers/xen/balloon.c    | 34 ++++++++++++++++++++++++----------
 2 files changed, 31 insertions(+), 10 deletions(-)

Comments

Juergen Gross April 4, 2025, 1:43 p.m. UTC | #1
On 04.04.25 15:34, Roger Pau Monne wrote:
> PVH dom0 re-uses logic from PV dom0, in which RAM ranges not assigned to
> dom0 are re-used as scratch memory to map foreign and grant pages.  Such
> logic relies on reporting those unpopulated ranges as RAM to Linux, and
> mark them as reserved.  This way Linux creates the underlying page
> structures required for metadata management.
> 
> Such approach works fine on PV because the initial balloon target is
> calculated using specific Xen data, that doesn't take into account the
> memory type changes described above.  However on HVM and PVH the initial
> balloon target is calculated using get_num_physpages(), and that function
> does take into account the unpopulated RAM regions used as scratch space
> for remote domain mappings.
> 
> This leads to PVH dom0 having an incorrect initial balloon target, which
> causes malfunction (excessive memory freeing) of the balloon driver if the
> dom0 memory target is later adjusted from the toolstack.
> 
> Fix this by using xen_released_pages to account for any pages that are part
> of the memory map, but are already unpopulated when the balloon driver is
> initialized.  This accounts for any regions used for scratch remote
> mappings.
> 
> Take the opportunity to unify PV with PVH/HVM guests regarding the usage of
> get_num_physpages(), as that avoids having to add different logic for PV vs
> PVH in both balloon_add_regions() and arch_xen_unpopulated_init().
> 
> Much like a6aa4eb994ee, the code in this changeset should have been part of
> 38620fc4e893.
> 
> Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Dave Hansen April 4, 2025, 6:20 p.m. UTC | #2
On 4/4/25 06:34, Roger Pau Monne wrote:
> Much like a6aa4eb994ee, the code in this changeset should have been part of
> 38620fc4e893.
> 
> Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I don't see a cc:stable@ on there. Was there a reason for that or did
you just leave it off by accident?
Juergen Gross April 4, 2025, 6:26 p.m. UTC | #3
On 04.04.25 20:20, Dave Hansen wrote:
> On 4/4/25 06:34, Roger Pau Monne wrote:
>> Much like a6aa4eb994ee, the code in this changeset should have been part of
>> 38620fc4e893.
>>
>> Fixes: a6aa4eb994ee ('xen/x86: add extra pages to unpopulated-alloc if available')
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I don't see a cc:stable@ on there. Was there a reason for that or did
> you just leave it off by accident?

Thanks for noticing. I'll add it when committing.


Juergen
kernel test robot April 5, 2025, 12:45 p.m. UTC | #4
Hi Roger,

kernel test robot noticed the following build errors:

[auto build test ERROR on xen-tip/linux-next]
[also build test ERROR on tip/x86/core linus/master v6.14 next-20250404]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Roger-Pau-Monne/x86-xen-fix-balloon-target-initialization-for-PVH-dom0/20250404-214218
base:   https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git linux-next
patch link:    https://lore.kernel.org/r/20250404133459.16125-1-roger.pau%40citrix.com
patch subject: [PATCH v2] x86/xen: fix balloon target initialization for PVH dom0
config: x86_64-buildonly-randconfig-001-20250405 (https://download.01.org/0day-ci/archive/20250405/202504052022.avvx45LI-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250405/202504052022.avvx45LI-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504052022.avvx45LI-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: drivers/xen/balloon.o: in function `balloon_init':
>> drivers/xen/balloon.c:727: undefined reference to `xen_released_pages'


vim +727 drivers/xen/balloon.c

   716	
   717	static int __init balloon_init(void)
   718	{
   719		struct task_struct *task;
   720		int rc;
   721	
   722		if (!xen_domain())
   723			return -ENODEV;
   724	
   725		pr_info("Initialising balloon driver\n");
   726	
 > 727		if (xen_released_pages >= get_num_physpages()) {
   728			WARN(1, "Released pages underflow current target");
   729			return -ERANGE;
   730		}
   731	
   732		balloon_stats.current_pages = get_num_physpages() - xen_released_pages;
   733		balloon_stats.target_pages  = balloon_stats.current_pages;
   734		balloon_stats.balloon_low   = 0;
   735		balloon_stats.balloon_high  = 0;
   736		balloon_stats.total_pages   = balloon_stats.current_pages;
   737	
   738		balloon_stats.schedule_delay = 1;
   739		balloon_stats.max_schedule_delay = 32;
   740		balloon_stats.retry_count = 1;
   741		balloon_stats.max_retry_count = 4;
   742
kernel test robot April 5, 2025, 3:55 p.m. UTC | #5
Hi Roger,

kernel test robot noticed the following build errors:

[auto build test ERROR on xen-tip/linux-next]
[also build test ERROR on tip/x86/core linus/master v6.14 next-20250404]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Roger-Pau-Monne/x86-xen-fix-balloon-target-initialization-for-PVH-dom0/20250404-214218
base:   https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git linux-next
patch link:    https://lore.kernel.org/r/20250404133459.16125-1-roger.pau%40citrix.com
patch subject: [PATCH v2] x86/xen: fix balloon target initialization for PVH dom0
config: x86_64-buildonly-randconfig-003-20250405 (https://download.01.org/0day-ci/archive/20250405/202504052301.dMtJILzp-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250405/202504052301.dMtJILzp-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504052301.dMtJILzp-lkp@intel.com/

All errors (new ones prefixed by >>):

>> ld.lld: error: undefined symbol: xen_released_pages
   >>> referenced by balloon.c
   >>>               drivers/xen/balloon.o:(balloon_init) in archive vmlinux.a
   >>> referenced by balloon.c
   >>>               drivers/xen/balloon.o:(balloon_init) in archive vmlinux.a
diff mbox series

Patch

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 43dcd8c7badc..651bb206434c 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -466,6 +466,13 @@  int __init arch_xen_unpopulated_init(struct resource **res)
 			xen_free_unpopulated_pages(1, &pg);
 		}
 
+		/*
+		 * Account for the region being in the physmap but unpopulated.
+		 * The value in xen_released_pages is used by the balloon
+		 * driver to know how much of the physmap is unpopulated and
+		 * set an accurate initial memory target.
+		 */
+		xen_released_pages += xen_extra_mem[i].n_pfns;
 		/* Zero so region is not also added to the balloon driver. */
 		xen_extra_mem[i].n_pfns = 0;
 	}
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 163f7f1d70f1..ee165f4f7fe6 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -675,7 +675,7 @@  void xen_free_ballooned_pages(unsigned int nr_pages, struct page **pages)
 }
 EXPORT_SYMBOL(xen_free_ballooned_pages);
 
-static void __init balloon_add_regions(void)
+static int __init balloon_add_regions(void)
 {
 	unsigned long start_pfn, pages;
 	unsigned long pfn, extra_pfn_end;
@@ -698,26 +698,38 @@  static void __init balloon_add_regions(void)
 		for (pfn = start_pfn; pfn < extra_pfn_end; pfn++)
 			balloon_append(pfn_to_page(pfn));
 
-		balloon_stats.total_pages += extra_pfn_end - start_pfn;
+		/*
+		 * Extra regions are accounted for in the physmap, but need
+		 * decreasing from current_pages to balloon down the initial
+		 * allocation, because they are already accounted for in
+		 * total_pages.
+		 */
+		if (extra_pfn_end - start_pfn >= balloon_stats.current_pages) {
+			WARN(1, "Extra pages underflow current target");
+			return -ERANGE;
+		}
+		balloon_stats.current_pages -= extra_pfn_end - start_pfn;
 	}
+
+	return 0;
 }
 
 static int __init balloon_init(void)
 {
 	struct task_struct *task;
+	int rc;
 
 	if (!xen_domain())
 		return -ENODEV;
 
 	pr_info("Initialising balloon driver\n");
 
-#ifdef CONFIG_XEN_PV
-	balloon_stats.current_pages = xen_pv_domain()
-		? min(xen_start_info->nr_pages - xen_released_pages, max_pfn)
-		: get_num_physpages();
-#else
-	balloon_stats.current_pages = get_num_physpages();
-#endif
+	if (xen_released_pages >= get_num_physpages()) {
+		WARN(1, "Released pages underflow current target");
+		return -ERANGE;
+	}
+
+	balloon_stats.current_pages = get_num_physpages() - xen_released_pages;
 	balloon_stats.target_pages  = balloon_stats.current_pages;
 	balloon_stats.balloon_low   = 0;
 	balloon_stats.balloon_high  = 0;
@@ -734,7 +746,9 @@  static int __init balloon_init(void)
 	register_sysctl_init("xen/balloon", balloon_table);
 #endif
 
-	balloon_add_regions();
+	rc = balloon_add_regions();
+	if (rc)
+		return rc;
 
 	task = kthread_run(balloon_thread, NULL, "xen-balloon");
 	if (IS_ERR(task)) {