xl: relax freemem()'s retry calculation

Message ID	55556c13-dbaa-3eb7-9f3a-9e448a0324aa@suse.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> Message-ID: <55556c13-dbaa-3eb7-9f3a-9e448a0324aa@suse.com> Date: Fri, 8 Jul 2022 15:39:38 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-US To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> Cc: Anthony Perard <anthony.perard@citrix.com>, Wei Liu <wl@xen.org> From: Jan Beulich <jbeulich@suse.com> Subject: [PATCH] xl: relax freemem()'s retry calculation Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit MIME-Version: 1.0
Series	xl: relax freemem()'s retry calculation \| expand xl: relax freemem()'s retry calculation

Message ID

55556c13-dbaa-3eb7-9f3a-9e448a0324aa@suse.com (mailing list archive)

State

New, archived

Headers

Errors-To: xen-devel-bounces@lists.xenproject.org
Precedence: list
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Message-ID: <55556c13-dbaa-3eb7-9f3a-9e448a0324aa@suse.com>
Date: Fri, 8 Jul 2022 15:39:38 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Content-Language: en-US
To: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Anthony Perard <anthony.perard@citrix.com>, Wei Liu <wl@xen.org>
From: Jan Beulich <jbeulich@suse.com>
Subject: [PATCH] xl: relax freemem()'s retry calculation
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?AxTYVhGCUJcLJYcnyiYYv5GUSVKM?=
	=?utf-8?q?lpchYn7XXX8UVyQS2EuiEpUtPXQDgJyN3fSsMp8iCJhLzpA1zTR/NAns//7gLajjV?=
	=?utf-8?q?J/Y8+aX+TecraRN2i86rWQK4oGT19xB4RCEkIKLf8NPFOTeVSdz73XcJVjK6lsSDL?=
	=?utf-8?q?kliMh8CCMwkK1tB3K7eYIaziriRncf5kM+Kj0FEjHluSQgzlH7LbFLBregzdhfzmX?=
	=?utf-8?q?FV/XEEf+wGER7wJTRtZ/jrEGhM31upDqRO/C02D9BFhEKihMwnWiTat0BxX8HXX1y?=
	=?utf-8?q?C/fSoqHqFPRMe5UwRDWX5PSPrnuxkoQq9T5SOUAHlF7beg2+NqVi5XBQZ9j5fFwrN?=
	=?utf-8?q?t81DAqbQ0a+NH0NYFFGSLSnkFcEYrBEZrzaeKfcKemHdToyDpVzlpSzHfFxDdM0PQ?=
	=?utf-8?q?o3Z2uqK/VqT6OnMZ8ztcWPM6oiGvjXY8nwSgjQ7wiDjWsZr6RKpIrZvhwNEPH6iPr?=
	=?utf-8?q?PDfvIl6Mkvyr1ElTN3xAhQZIY1Xt4i741TKPyCezRt+qzyu0wO9cTIWUqeiF21qg4?=
	=?utf-8?q?UCK1JpGnks2SiCU56I5xlQ13veSHoXJ3vuiFQCmdZ3SQ5K+UM/IeY/1KsFENmekEJ?=
	=?utf-8?q?oOyLQcHCrseVPs8THVnGOHCsUcpD9HNBXgvJnm1QfDS+3mjc9BMFLnGdjjuBwGbus?=
	=?utf-8?q?V/hHdGRGu/JF3VpnPJK5mCMV9OGO0CCnMX2D7BDYLj2bP49tZjTdRccxdi4GnxhBS?=
	=?utf-8?q?pqMpD4LlRKK2W3EdC9K5Fa5npRP2hADu5subS6iKWel1+S25fiAGmeNFPeHUSr9/x?=
	=?utf-8?q?uQChXq6C3a1NSKXyBzrYROwllZJRyQPYFPOp4IJDLYAkC3JmRm+cNwel4dxipTKG4?=
	=?utf-8?q?Kq8eX9kGlFQnlYW59QSoAAWUBAlBbN8t1uZUkWYI9bbl/jCThoKLA8PAsQaQpphSj?=
	=?utf-8?q?0Kto5lU+rAEYLREIRXumx+N6cmf0neJIyu2l/MpdwbvxUWTncgAQL7QBk7FemfWcL?=
	=?utf-8?q?a+4KNeelz5gQOxGxsqjo4hW43zZ4i9A9moJVncg+mlz8u9BX1mYirZ3+cHRhm4szK?=
	=?utf-8?q?uV4I7o+eoQdVNkWAIK/oVvCavmwyflFRRMJW2HTN5Dv5ON1RAoWBCUrKgqIs0t/YC?=
	=?utf-8?q?1vNuAEpcdwNIfsRrCEZbYyesC9zr6rEHLiSPMliDLUVAcw69xXGPrbd19x8CDKA5V?=
	=?utf-8?q?dWze/H8/d4ff6PyUzyizP64U8njDYF6mJFPmqtCqhgXCDteSvoCEidniXPSu/RPY6?=
	=?utf-8?q?xv+MYvW/5K2hBBLTe36ZkL5F/bs4VElyW6D9bHPBcWAo+6jXEBXcEUrVvoS4C5klw?=
	=?utf-8?q?x1pIjKvAOUwBPdJkyf1Allbj/IPIRAdvPBOq2uL2NDLXhWOsBSE2rVO9tpGPlVszY?=
	=?utf-8?q?B50b8/4hQqrRchrXj2cPLGG5G/k9zSewtZITShbitEmLLXm82ThyjKHydzuG7kdBV?=
	=?utf-8?q?UafIjkynjhLQOJuyfBnSZLzTHzkjThG7djRbBEzZF3ewJyIAXOC/36rUjfIykf9qq?=
	=?utf-8?q?P1L2odXIa3qlnkTfLTO5BqcA4o3cS4I0Ia2QIipTFAixno4O7hfZRgDTsTlro5Cdd?=
	=?utf-8?q?z/Qbl41B/IYj?=
X-OriginatorOrg: suse.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 195fece6-f747-427b-f82f-08da60e74e4f
X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jul 2022 13:39:39.5171
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 
 13Hta00gHRVKaeYOcwLwd00HKf0K7FYpyzYUMd5K6sNEh96D6/FAymLv+aY2WKnyrSVphdaTWDSuyYAyi+3coQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR04MB4418

Series

xl: relax freemem()'s retry calculation | expand

Commit Message

Jan Beulich July 8, 2022, 1:39 p.m. UTC

While in principle possible also under other conditions as long as other
parallel operations potentially consuming memory aren't "locked out", in
particular with IOMMU large page mappings used in Dom0 (for PV when in
strict mode; for PVH when not sharing page tables with HAP) ballooning
out of individual pages can actually lead to less free memory available
afterwards. This is because to split a large page, one or more page
table pages are necessary (one per level that is split).

When rebooting a guest I've observed freemem() to fail: A single page
was required to be ballooned out (presumably because of heap
fragmentation in the hypervisor). This ballooning out of a single page
of course went fast, but freemem() then found that it would require to
balloon out another page. This repeating just another time leads to the
function to signal failure to the caller - without having come anywhere
near the designated 30s that the whole process is allowed to not make
any progress at all.

Convert from a simple retry count to actually calculating elapsed time,
subtracting from an initial credit of 30s. Don't go as far as limiting
the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
While this leads to the overall process now possibly taking longer (if
the previous iteration ended very close to the intended 30s), this
compensates to some degree for the value passed really meaning "allowed
to run for this long without making progress".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
see https://lists.xen.org/archives/html/xen-devel/2021-08/msg00781.html
---
I further wonder whether the "credit expired" loop exit wouldn't better
be moved to the middle of the loop, immediately after "return true".
That way having reached the goal on the last iteration would be reported
as success to the caller, rather than as "timed out".

Comments

Anthony PERARD July 11, 2022, 4:21 p.m. UTC | #1

On Fri, Jul 08, 2022 at 03:39:38PM +0200, Jan Beulich wrote:
> While in principle possible also under other conditions as long as other
> parallel operations potentially consuming memory aren't "locked out", in
> particular with IOMMU large page mappings used in Dom0 (for PV when in
> strict mode; for PVH when not sharing page tables with HAP) ballooning
> out of individual pages can actually lead to less free memory available
> afterwards. This is because to split a large page, one or more page
> table pages are necessary (one per level that is split).
> 
> When rebooting a guest I've observed freemem() to fail: A single page
> was required to be ballooned out (presumably because of heap
> fragmentation in the hypervisor). This ballooning out of a single page
> of course went fast, but freemem() then found that it would require to
> balloon out another page. This repeating just another time leads to the
> function to signal failure to the caller - without having come anywhere
> near the designated 30s that the whole process is allowed to not make
> any progress at all.
> 
> Convert from a simple retry count to actually calculating elapsed time,
> subtracting from an initial credit of 30s. Don't go as far as limiting
> the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
> While this leads to the overall process now possibly taking longer (if
> the previous iteration ended very close to the intended 30s), this
> compensates to some degree for the value passed really meaning "allowed
> to run for this long without making progress".
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> I further wonder whether the "credit expired" loop exit wouldn't better
> be moved to the middle of the loop, immediately after "return true".
> That way having reached the goal on the last iteration would be reported
> as success to the caller, rather than as "timed out".

That would sound like a good improvement to the patch.

Thanks,

Jan Beulich July 12, 2022, 7:01 a.m. UTC | #2

On 11.07.2022 18:21, Anthony PERARD wrote:
> On Fri, Jul 08, 2022 at 03:39:38PM +0200, Jan Beulich wrote:
>> While in principle possible also under other conditions as long as other
>> parallel operations potentially consuming memory aren't "locked out", in
>> particular with IOMMU large page mappings used in Dom0 (for PV when in
>> strict mode; for PVH when not sharing page tables with HAP) ballooning
>> out of individual pages can actually lead to less free memory available
>> afterwards. This is because to split a large page, one or more page
>> table pages are necessary (one per level that is split).
>>
>> When rebooting a guest I've observed freemem() to fail: A single page
>> was required to be ballooned out (presumably because of heap
>> fragmentation in the hypervisor). This ballooning out of a single page
>> of course went fast, but freemem() then found that it would require to
>> balloon out another page. This repeating just another time leads to the
>> function to signal failure to the caller - without having come anywhere
>> near the designated 30s that the whole process is allowed to not make
>> any progress at all.
>>
>> Convert from a simple retry count to actually calculating elapsed time,
>> subtracting from an initial credit of 30s. Don't go as far as limiting
>> the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
>> While this leads to the overall process now possibly taking longer (if
>> the previous iteration ended very close to the intended 30s), this
>> compensates to some degree for the value passed really meaning "allowed
>> to run for this long without making progress".
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> I further wonder whether the "credit expired" loop exit wouldn't better
>> be moved to the middle of the loop, immediately after "return true".
>> That way having reached the goal on the last iteration would be reported
>> as success to the caller, rather than as "timed out".
> 
> That would sound like a good improvement to the patch.

Oh. I would have made it a separate one, if deemed sensible. Order
shouldn't matter as I'd consider both backporting candidates.

Jan

Jan Beulich July 12, 2022, 7:13 a.m. UTC | #3

On 12.07.2022 09:01, Jan Beulich wrote:
> On 11.07.2022 18:21, Anthony PERARD wrote:
>> On Fri, Jul 08, 2022 at 03:39:38PM +0200, Jan Beulich wrote:
>>> While in principle possible also under other conditions as long as other
>>> parallel operations potentially consuming memory aren't "locked out", in
>>> particular with IOMMU large page mappings used in Dom0 (for PV when in
>>> strict mode; for PVH when not sharing page tables with HAP) ballooning
>>> out of individual pages can actually lead to less free memory available
>>> afterwards. This is because to split a large page, one or more page
>>> table pages are necessary (one per level that is split).
>>>
>>> When rebooting a guest I've observed freemem() to fail: A single page
>>> was required to be ballooned out (presumably because of heap
>>> fragmentation in the hypervisor). This ballooning out of a single page
>>> of course went fast, but freemem() then found that it would require to
>>> balloon out another page. This repeating just another time leads to the
>>> function to signal failure to the caller - without having come anywhere
>>> near the designated 30s that the whole process is allowed to not make
>>> any progress at all.
>>>
>>> Convert from a simple retry count to actually calculating elapsed time,
>>> subtracting from an initial credit of 30s. Don't go as far as limiting
>>> the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
>>> While this leads to the overall process now possibly taking longer (if
>>> the previous iteration ended very close to the intended 30s), this
>>> compensates to some degree for the value passed really meaning "allowed
>>> to run for this long without making progress".
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> I further wonder whether the "credit expired" loop exit wouldn't better
>>> be moved to the middle of the loop, immediately after "return true".
>>> That way having reached the goal on the last iteration would be reported
>>> as success to the caller, rather than as "timed out".
>>
>> That would sound like a good improvement to the patch.
> 
> Oh. I would have made it a separate one, if deemed sensible. Order
> shouldn't matter as I'd consider both backporting candidates.

Except, of course, if the change here is controversial or needs a lot
of further refinement, in which case the other one may better go first.
Please let me know ...

Jan

Anthony PERARD July 12, 2022, 12:55 p.m. UTC | #4

On Tue, Jul 12, 2022 at 09:01:48AM +0200, Jan Beulich wrote:
> On 11.07.2022 18:21, Anthony PERARD wrote:
> > On Fri, Jul 08, 2022 at 03:39:38PM +0200, Jan Beulich wrote:
> >> While in principle possible also under other conditions as long as other
> >> parallel operations potentially consuming memory aren't "locked out", in
> >> particular with IOMMU large page mappings used in Dom0 (for PV when in
> >> strict mode; for PVH when not sharing page tables with HAP) ballooning
> >> out of individual pages can actually lead to less free memory available
> >> afterwards. This is because to split a large page, one or more page
> >> table pages are necessary (one per level that is split).
> >>
> >> When rebooting a guest I've observed freemem() to fail: A single page
> >> was required to be ballooned out (presumably because of heap
> >> fragmentation in the hypervisor). This ballooning out of a single page
> >> of course went fast, but freemem() then found that it would require to
> >> balloon out another page. This repeating just another time leads to the
> >> function to signal failure to the caller - without having come anywhere
> >> near the designated 30s that the whole process is allowed to not make
> >> any progress at all.
> >>
> >> Convert from a simple retry count to actually calculating elapsed time,
> >> subtracting from an initial credit of 30s. Don't go as far as limiting
> >> the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
> >> While this leads to the overall process now possibly taking longer (if
> >> the previous iteration ended very close to the intended 30s), this
> >> compensates to some degree for the value passed really meaning "allowed
> >> to run for this long without making progress".
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> I further wonder whether the "credit expired" loop exit wouldn't better
> >> be moved to the middle of the loop, immediately after "return true".
> >> That way having reached the goal on the last iteration would be reported
> >> as success to the caller, rather than as "timed out".
> > 
> > That would sound like a good improvement to the patch.
> 
> Oh. I would have made it a separate one, if deemed sensible. Order
> shouldn't matter as I'd consider both backporting candidates.

OK.

For this patch:
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

Thanks,

--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -321,7 +321,8 @@  static int domain_wait_event(uint32_t do
  */
 static bool freemem(uint32_t domid, libxl_domain_config *d_config)
 {
-    int rc, retries = 3;
+    int rc;
+    double credit = 30;
     uint64_t need_memkb, free_memkb;
 
     if (!autoballoon)
@@ -332,6 +333,8 @@  static bool freemem(uint32_t domid, libx
         return false;
 
     do {
+        time_t start;
+
         rc = libxl_get_free_memory(ctx, &free_memkb);
         if (rc < 0)
             return false;
@@ -345,12 +348,13 @@  static bool freemem(uint32_t domid, libx
 
         /* wait until dom0 reaches its target, as long as we are making
          * progress */
+        start = time(NULL);
         rc = libxl_wait_for_memory_target(ctx, 0, 10);
         if (rc < 0)
             return false;
 
-        retries--;
-    } while (retries > 0);
+        credit -= difftime(time(NULL), start);
+    } while (credit > 0);
 
     return false;
 }

xl: relax freemem()'s retry calculation

Commit Message

Comments

Patch