diff mbox series

[RFC,2/3] mm, memory_hotplug: deobfuscate migration part of offlining

Message ID 20181120134323.13007-3-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show
Series few memory offlining enhancements | expand

Commit Message

Michal Hocko Nov. 20, 2018, 1:43 p.m. UTC
From: Michal Hocko <mhocko@suse.com>

Memory migration might fail during offlining and we keep retrying in
that case. This is currently obfuscate by goto retry loop. The code
is hard to follow and as a result it is even suboptimal becase each
retry round scans the full range from start_pfn even though we have
successfully scanned/migrated [start_pfn, pfn] range already. This
is all only because check_pages_isolated failure has to rescan the full
range again.

De-obfuscate the migration retry loop by promoting it to a real for
loop. In fact remove the goto altogether by making it a proper double
loop (yeah, gotos are nasty in this specific case). In the end we
will get a slightly more optimal code which is better readable.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

Comments

David Hildenbrand Nov. 20, 2018, 2:26 p.m. UTC | #1
On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Memory migration might fail during offlining and we keep retrying in
> that case. This is currently obfuscate by goto retry loop. The code
> is hard to follow and as a result it is even suboptimal becase each
> retry round scans the full range from start_pfn even though we have
> successfully scanned/migrated [start_pfn, pfn] range already. This
> is all only because check_pages_isolated failure has to rescan the full
> range again.
> 
> De-obfuscate the migration retry loop by promoting it to a real for
> loop. In fact remove the goto altogether by making it a proper double
> loop (yeah, gotos are nasty in this specific case). In the end we
> will get a slightly more optimal code which is better readable.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
>  1 file changed, 31 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6263c8cd4491..9cd161db3061 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1591,38 +1591,40 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		goto failed_removal_isolated;
>  	}
>  
> -	pfn = start_pfn;
> -repeat:
> -	/* start memory hot removal */
> -	ret = -EINTR;
> -	if (signal_pending(current)) {
> -		reason = "signal backoff";
> -		goto failed_removal_isolated;
> -	}
> +	do {
> +		for (pfn = start_pfn; pfn;)
> +		{

{ on a new line looks weird.

> +			/* start memory hot removal */
> +			ret = -EINTR;

I think we can move that into the "if (signal_pending(current))"

(if my eyes are not wrong, this will not be touched otherwise)

> +			if (signal_pending(current)) {
> +				reason = "signal backoff";
> +				goto failed_removal_isolated;
> +			}
>  
> -	cond_resched();
> -	lru_add_drain_all();
> -	drain_all_pages(zone);
> +			cond_resched();
> +			lru_add_drain_all();
> +			drain_all_pages(zone);
>  
> -	pfn = scan_movable_pages(start_pfn, end_pfn);
> -	if (pfn) { /* We have movable pages */
> -		ret = do_migrate_range(pfn, end_pfn);
> -		goto repeat;
> -	}
> +			pfn = scan_movable_pages(pfn, end_pfn);
> +			if (pfn) {
> +				/* TODO fatal migration failures should bail out */
> +				do_migrate_range(pfn, end_pfn);

Right, that return value was always ignored.

> +			}
> +		}
> +
> +		/*
> +		 * dissolve free hugepages in the memory block before doing offlining
> +		 * actually in order to make hugetlbfs's object counting consistent.
> +		 */
> +		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> +		if (ret) {
> +			reason = "failure to dissolve huge pages";
> +			goto failed_removal_isolated;
> +		}
> +		/* check again */
> +		offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> +	} while (offlined_pages < 0);
>  
> -	/*
> -	 * dissolve free hugepages in the memory block before doing offlining
> -	 * actually in order to make hugetlbfs's object counting consistent.
> -	 */
> -	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -	if (ret) {
> -		reason = "failure to dissolve huge pages";
> -		goto failed_removal_isolated;
> -	}
> -	/* check again */
> -	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> -	if (offlined_pages < 0)
> -		goto repeat;
>  	pr_info("Offlined Pages %ld\n", offlined_pages);
>  	/* Ok, all of our target is isolated.
>  	   We cannot do rollback at this point. */
> 

Looks much better to me.
Michal Hocko Nov. 20, 2018, 2:34 p.m. UTC | #2
On Tue 20-11-18 15:26:43, David Hildenbrand wrote:
[...]
> > +	do {
> > +		for (pfn = start_pfn; pfn;)
> > +		{
> 
> { on a new line looks weird.
> 
> > +			/* start memory hot removal */
> > +			ret = -EINTR;
> 
> I think we can move that into the "if (signal_pending(current))"
> 
> (if my eyes are not wrong, this will not be touched otherwise)

Better?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9cd161db3061..6bc3aee30f5e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1592,11 +1592,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	}
 
 	do {
-		for (pfn = start_pfn; pfn;)
-		{
+		for (pfn = start_pfn; pfn;) {
 			/* start memory hot removal */
-			ret = -EINTR;
 			if (signal_pending(current)) {
+				ret = -EINTR;
 				reason = "signal backoff";
 				goto failed_removal_isolated;
 			}
David Hildenbrand Nov. 20, 2018, 2:34 p.m. UTC | #3
On 20.11.18 15:34, Michal Hocko wrote:
> On Tue 20-11-18 15:26:43, David Hildenbrand wrote:
> [...]
>>> +	do {
>>> +		for (pfn = start_pfn; pfn;)
>>> +		{
>>
>> { on a new line looks weird.
>>
>>> +			/* start memory hot removal */
>>> +			ret = -EINTR;
>>
>> I think we can move that into the "if (signal_pending(current))"
>>
>> (if my eyes are not wrong, this will not be touched otherwise)
> 
> Better?
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9cd161db3061..6bc3aee30f5e 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1592,11 +1592,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	}
>  
>  	do {
> -		for (pfn = start_pfn; pfn;)
> -		{
> +		for (pfn = start_pfn; pfn;) {
>  			/* start memory hot removal */
> -			ret = -EINTR;
>  			if (signal_pending(current)) {
> +				ret = -EINTR;
>  				reason = "signal backoff";
>  				goto failed_removal_isolated;
>  			}
> 

Reviewed-by: David Hildenbrand <david@redhat.com>

:)
Oscar Salvador Nov. 20, 2018, 3:13 p.m. UTC | #4
> Signed-off-by: Michal Hocko <mhocko@suse.com>
[...]
> +	do {
> +		for (pfn = start_pfn; pfn;)
> +		{
> +			/* start memory hot removal */

Should we change thAT comment? I mean, this is not really the hot-
removal stage.

Maybe "start memory migration" suits better? or memory offlining?

> +			ret = -EINTR;
> +			if (signal_pending(current)) {
> +				reason = "signal backoff";
> +				goto failed_removal_isolated;
> +			}
>  
> -	cond_resched();
> -	lru_add_drain_all();
> -	drain_all_pages(zone);
> +			cond_resched();
> +			lru_add_drain_all();
> +			drain_all_pages(zone);
>  
> -	pfn = scan_movable_pages(start_pfn, end_pfn);
> -	if (pfn) { /* We have movable pages */
> -		ret = do_migrate_range(pfn, end_pfn);
> -		goto repeat;
> -	}
> +			pfn = scan_movable_pages(pfn, end_pfn);
> +			if (pfn) {
> +				/* TODO fatal migration failures
> should bail out */
> +				do_migrate_range(pfn, end_pfn);
> +			}
> +		}
> +
> +		/*
> +		 * dissolve free hugepages in the memory block
> before doing offlining
> +		 * actually in order to make hugetlbfs's object
> counting consistent.
> +		 */
> +		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> +		if (ret) {
> +			reason = "failure to dissolve huge pages";
> +			goto failed_removal_isolated;
> +		}
> +		/* check again */
> +		offlined_pages = check_pages_isolated(start_pfn,
> end_pfn);
> +	} while (offlined_pages < 0);
>  
> -	/*
> -	 * dissolve free hugepages in the memory block before doing
> offlining
> -	 * actually in order to make hugetlbfs's object counting
> consistent.
> -	 */
> -	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> -	if (ret) {
> -		reason = "failure to dissolve huge pages";
> -		goto failed_removal_isolated;
> -	}
> -	/* check again */
> -	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> -	if (offlined_pages < 0)
> -		goto repeat;

This indeed looks much nicer and it is easier to follow.
With the changes commented by David:

Reviewed-by: Oscar Salvador <osalvador@suse.de>
Michal Hocko Nov. 20, 2018, 3:18 p.m. UTC | #5
On Tue 20-11-18 16:13:35, Oscar Salvador wrote:
> 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> [...]
> > +	do {
> > +		for (pfn = start_pfn; pfn;)
> > +		{
> > +			/* start memory hot removal */
> 
> Should we change thAT comment? I mean, this is not really the hot-
> removal stage.
> 
> Maybe "start memory migration" suits better? or memory offlining?

I will just drop it. It doesn't really explain anything.
[...]
> 
> This indeed looks much nicer and it is easier to follow.
> With the changes commented by David:
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks!
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6263c8cd4491..9cd161db3061 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1591,38 +1591,40 @@  static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal_isolated;
 	}
 
-	pfn = start_pfn;
-repeat:
-	/* start memory hot removal */
-	ret = -EINTR;
-	if (signal_pending(current)) {
-		reason = "signal backoff";
-		goto failed_removal_isolated;
-	}
+	do {
+		for (pfn = start_pfn; pfn;)
+		{
+			/* start memory hot removal */
+			ret = -EINTR;
+			if (signal_pending(current)) {
+				reason = "signal backoff";
+				goto failed_removal_isolated;
+			}
 
-	cond_resched();
-	lru_add_drain_all();
-	drain_all_pages(zone);
+			cond_resched();
+			lru_add_drain_all();
+			drain_all_pages(zone);
 
-	pfn = scan_movable_pages(start_pfn, end_pfn);
-	if (pfn) { /* We have movable pages */
-		ret = do_migrate_range(pfn, end_pfn);
-		goto repeat;
-	}
+			pfn = scan_movable_pages(pfn, end_pfn);
+			if (pfn) {
+				/* TODO fatal migration failures should bail out */
+				do_migrate_range(pfn, end_pfn);
+			}
+		}
+
+		/*
+		 * dissolve free hugepages in the memory block before doing offlining
+		 * actually in order to make hugetlbfs's object counting consistent.
+		 */
+		ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+		if (ret) {
+			reason = "failure to dissolve huge pages";
+			goto failed_removal_isolated;
+		}
+		/* check again */
+		offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+	} while (offlined_pages < 0);
 
-	/*
-	 * dissolve free hugepages in the memory block before doing offlining
-	 * actually in order to make hugetlbfs's object counting consistent.
-	 */
-	ret = dissolve_free_huge_pages(start_pfn, end_pfn);
-	if (ret) {
-		reason = "failure to dissolve huge pages";
-		goto failed_removal_isolated;
-	}
-	/* check again */
-	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
-	if (offlined_pages < 0)
-		goto repeat;
 	pr_info("Offlined Pages %ld\n", offlined_pages);
 	/* Ok, all of our target is isolated.
 	   We cannot do rollback at this point. */