[RFC,V2,3/3] s390/mm: Define arch_get_mappable_range()

Message ID	1606706992-26656-4-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=qKxp=FE=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C029720663 From: Anshuman Khandual <anshuman.khandual@arm.com> To: linux-mm@kvack.org, akpm@linux-foundation.org, david@redhat.com Cc: linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, Anshuman Khandual <anshuman.khandual@arm.com>, Heiko Carstens <hca@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com> Subject: [RFC V2 3/3] s390/mm: Define arch_get_mappable_range() Date: Mon, 30 Nov 2020 08:59:52 +0530 Message-Id: <1606706992-26656-4-git-send-email-anshuman.khandual@arm.com> In-Reply-To: <1606706992-26656-1-git-send-email-anshuman.khandual@arm.com> References: <1606706992-26656-1-git-send-email-anshuman.khandual@arm.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm/hotplug: Pre-validate the address range with platform \| expand [RFC,V2,0/3] mm/hotplug: Pre-validate the address range with platform [RFC,V2,1/3] mm/hotplug: Prevalidate the address range being added with platform [RFC,V2,2/3] arm64/mm: Define arch_get_mappable_range() [RFC,V2,3/3] s390/mm: Define arch_get_mappable_range()

Anshuman Khandual Nov. 30, 2020, 3:29 a.m. UTC

This overrides arch_get_mappabble_range() on s390 platform and drops now
redundant similar check in vmem_add_mapping(). This compensates by adding
a new check __segment_load() to preserve the existing functionality.

Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: linux-s390@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/s390/mm/extmem.c |  5 +++++
 arch/s390/mm/vmem.c   | 13 +++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

Heiko Carstens Dec. 2, 2020, 8:32 p.m. UTC | #1

On Mon, Nov 30, 2020 at 08:59:52AM +0530, Anshuman Khandual wrote:
> This overrides arch_get_mappabble_range() on s390 platform and drops now
> redundant similar check in vmem_add_mapping(). This compensates by adding
> a new check __segment_load() to preserve the existing functionality.
> 
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: linux-s390@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/s390/mm/extmem.c |  5 +++++
>  arch/s390/mm/vmem.c   | 13 +++++++++----
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
> index 5060956b8e7d..cc055a78f7b6 100644
> --- a/arch/s390/mm/extmem.c
> +++ b/arch/s390/mm/extmem.c
> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>  		goto out_free_resource;
>  	}
>  
> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
> +		rc = -ERANGE;
> +		goto out_resource;
> +	}
> +
>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>  	if (rc)
>  		goto out_resource;
> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> index b239f2ba93b0..06dddcc0ce06 100644
> --- a/arch/s390/mm/vmem.c
> +++ b/arch/s390/mm/vmem.c
> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>  	mutex_unlock(&vmem_mutex);
>  }
>  
> +struct range arch_get_mappable_range(void)
> +{
> +	struct range memhp_range;
> +
> +	memhp_range.start = 0;
> +	memhp_range.end =  VMEM_MAX_PHYS;
> +	return memhp_range;
> +}
> +
>  int vmem_add_mapping(unsigned long start, unsigned long size)
>  {
>  	int ret;
>  
> -	if (start + size > VMEM_MAX_PHYS ||
> -	    start + size < start)
> -		return -ERANGE;
> -

I really fail to see how this could be considered an improvement for
s390. Especially I do not like that the (central) range check is now
moved to the caller (__segment_load). Which would mean potential
additional future callers would have to duplicate that code as well.

Anshuman Khandual Dec. 3, 2020, 12:33 a.m. UTC | #2

On 12/3/20 2:02 AM, Heiko Carstens wrote:
> On Mon, Nov 30, 2020 at 08:59:52AM +0530, Anshuman Khandual wrote:
>> This overrides arch_get_mappabble_range() on s390 platform and drops now
>> redundant similar check in vmem_add_mapping(). This compensates by adding
>> a new check __segment_load() to preserve the existing functionality.
>>
>> Cc: Heiko Carstens <hca@linux.ibm.com>
>> Cc: Vasily Gorbik <gor@linux.ibm.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: linux-s390@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>  arch/s390/mm/extmem.c |  5 +++++
>>  arch/s390/mm/vmem.c   | 13 +++++++++----
>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>> index 5060956b8e7d..cc055a78f7b6 100644
>> --- a/arch/s390/mm/extmem.c
>> +++ b/arch/s390/mm/extmem.c
>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>  		goto out_free_resource;
>>  	}
>>  
>> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>> +		rc = -ERANGE;
>> +		goto out_resource;
>> +	}
>> +
>>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>  	if (rc)
>>  		goto out_resource;
>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>> index b239f2ba93b0..06dddcc0ce06 100644
>> --- a/arch/s390/mm/vmem.c
>> +++ b/arch/s390/mm/vmem.c
>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>  	mutex_unlock(&vmem_mutex);
>>  }
>>  
>> +struct range arch_get_mappable_range(void)
>> +{
>> +	struct range memhp_range;
>> +
>> +	memhp_range.start = 0;
>> +	memhp_range.end =  VMEM_MAX_PHYS;
>> +	return memhp_range;
>> +}
>> +
>>  int vmem_add_mapping(unsigned long start, unsigned long size)
>>  {
>>  	int ret;
>>  
>> -	if (start + size > VMEM_MAX_PHYS ||
>> -	    start + size < start)
>> -		return -ERANGE;
>> -
> 
> I really fail to see how this could be considered an improvement for
> s390. Especially I do not like that the (central) range check is now
> moved to the caller (__segment_load). Which would mean potential
> additional future callers would have to duplicate that code as well.

The physical range check is being moved to the generic hotplug code
via arch_get_mappable_range() instead, making the existing check in
vmem_add_mapping() redundant. Dropping the check there necessitates
adding back a similar check in __segment_load(). Otherwise there
will be a loss of functionality in terms of range check.

May be we could just keep this existing check in vmem_add_mapping()
as well in order avoid this movement but then it would be redundant
check in every hotplug path.

So I guess the choice is to either have redundant range checks in
all hotplug paths or future internal callers of vmem_add_mapping()
take care of the range check.

Heiko Carstens Dec. 3, 2020, 11:51 a.m. UTC | #3

On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
> >> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
> >> index 5060956b8e7d..cc055a78f7b6 100644
> >> --- a/arch/s390/mm/extmem.c
> >> +++ b/arch/s390/mm/extmem.c
> >> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
> >>  		goto out_free_resource;
> >>  	}
> >>  
> >> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
> >> +		rc = -ERANGE;
> >> +		goto out_resource;
> >> +	}
> >> +
> >>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
> >>  	if (rc)
> >>  		goto out_resource;
> >> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
> >> index b239f2ba93b0..06dddcc0ce06 100644
> >> --- a/arch/s390/mm/vmem.c
> >> +++ b/arch/s390/mm/vmem.c
> >> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
> >>  	mutex_unlock(&vmem_mutex);
> >>  }
> >>  
> >> +struct range arch_get_mappable_range(void)
> >> +{
> >> +	struct range memhp_range;
> >> +
> >> +	memhp_range.start = 0;
> >> +	memhp_range.end =  VMEM_MAX_PHYS;
> >> +	return memhp_range;
> >> +}
> >> +
> >>  int vmem_add_mapping(unsigned long start, unsigned long size)
> >>  {
> >>  	int ret;
> >>  
> >> -	if (start + size > VMEM_MAX_PHYS ||
> >> -	    start + size < start)
> >> -		return -ERANGE;
> >> -
> > 
> > I really fail to see how this could be considered an improvement for
> > s390. Especially I do not like that the (central) range check is now
> > moved to the caller (__segment_load). Which would mean potential
> > additional future callers would have to duplicate that code as well.
> 
> The physical range check is being moved to the generic hotplug code
> via arch_get_mappable_range() instead, making the existing check in
> vmem_add_mapping() redundant. Dropping the check there necessitates
> adding back a similar check in __segment_load(). Otherwise there
> will be a loss of functionality in terms of range check.
> 
> May be we could just keep this existing check in vmem_add_mapping()
> as well in order avoid this movement but then it would be redundant
> check in every hotplug path.
> 
> So I guess the choice is to either have redundant range checks in
> all hotplug paths or future internal callers of vmem_add_mapping()
> take care of the range check.

The problem I have with this current approach from an architecture
perspective: we end up having two completely different methods which
are doing the same and must be kept in sync. This might be obvious
looking at this patch, but I'm sure this will go out-of-sync (aka
broken) sooner or later.

Therefore I would really like to see a single method to do the range
checking. Maybe you could add a callback into architecture code, so
that such an architecture specific function could also be used
elsewhere. Dunno.

David Hildenbrand Dec. 3, 2020, 12:01 p.m. UTC | #4

On 03.12.20 12:51, Heiko Carstens wrote:
> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>> --- a/arch/s390/mm/extmem.c
>>>> +++ b/arch/s390/mm/extmem.c
>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>  		goto out_free_resource;
>>>>  	}
>>>>  
>>>> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>> +		rc = -ERANGE;
>>>> +		goto out_resource;
>>>> +	}
>>>> +
>>>>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>  	if (rc)
>>>>  		goto out_resource;
>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>> --- a/arch/s390/mm/vmem.c
>>>> +++ b/arch/s390/mm/vmem.c
>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>  	mutex_unlock(&vmem_mutex);
>>>>  }
>>>>  
>>>> +struct range arch_get_mappable_range(void)
>>>> +{
>>>> +	struct range memhp_range;
>>>> +
>>>> +	memhp_range.start = 0;
>>>> +	memhp_range.end =  VMEM_MAX_PHYS;
>>>> +	return memhp_range;
>>>> +}
>>>> +
>>>>  int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>  {
>>>>  	int ret;
>>>>  
>>>> -	if (start + size > VMEM_MAX_PHYS ||
>>>> -	    start + size < start)
>>>> -		return -ERANGE;
>>>> -
>>>
>>> I really fail to see how this could be considered an improvement for
>>> s390. Especially I do not like that the (central) range check is now
>>> moved to the caller (__segment_load). Which would mean potential
>>> additional future callers would have to duplicate that code as well.
>>
>> The physical range check is being moved to the generic hotplug code
>> via arch_get_mappable_range() instead, making the existing check in
>> vmem_add_mapping() redundant. Dropping the check there necessitates
>> adding back a similar check in __segment_load(). Otherwise there
>> will be a loss of functionality in terms of range check.
>>
>> May be we could just keep this existing check in vmem_add_mapping()
>> as well in order avoid this movement but then it would be redundant
>> check in every hotplug path.
>>
>> So I guess the choice is to either have redundant range checks in
>> all hotplug paths or future internal callers of vmem_add_mapping()
>> take care of the range check.
> 
> The problem I have with this current approach from an architecture
> perspective: we end up having two completely different methods which
> are doing the same and must be kept in sync. This might be obvious
> looking at this patch, but I'm sure this will go out-of-sync (aka
> broken) sooner or later.

Exactly, there should be one function only that was the whole idea of
arch_get_mappable_range().

> 
> Therefore I would really like to see a single method to do the range
> checking. Maybe you could add a callback into architecture code, so
> that such an architecture specific function could also be used
> elsewhere. Dunno.
> 

I think we can just switch to using "memhp_range_allowed()" here then
after implementing arch_get_mappable_range().

Doesn't hurt to double check in vmem_add_mapping() - especially to keep
extmem working without changes. At least for callers of memory hotplug
it's then clear which values actually won't fail deep down in arch code.

Anshuman Khandual Dec. 7, 2020, 4:38 a.m. UTC | #5

On 12/3/20 5:31 PM, David Hildenbrand wrote:
> On 03.12.20 12:51, Heiko Carstens wrote:
>> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>>> --- a/arch/s390/mm/extmem.c
>>>>> +++ b/arch/s390/mm/extmem.c
>>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>>  		goto out_free_resource;
>>>>>  	}
>>>>>  
>>>>> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>>> +		rc = -ERANGE;
>>>>> +		goto out_resource;
>>>>> +	}
>>>>> +
>>>>>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>>  	if (rc)
>>>>>  		goto out_resource;
>>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>>> --- a/arch/s390/mm/vmem.c
>>>>> +++ b/arch/s390/mm/vmem.c
>>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>>  	mutex_unlock(&vmem_mutex);
>>>>>  }
>>>>>  
>>>>> +struct range arch_get_mappable_range(void)
>>>>> +{
>>>>> +	struct range memhp_range;
>>>>> +
>>>>> +	memhp_range.start = 0;
>>>>> +	memhp_range.end =  VMEM_MAX_PHYS;
>>>>> +	return memhp_range;
>>>>> +}
>>>>> +
>>>>>  int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>>  {
>>>>>  	int ret;
>>>>>  
>>>>> -	if (start + size > VMEM_MAX_PHYS ||
>>>>> -	    start + size < start)
>>>>> -		return -ERANGE;
>>>>> -
>>>>
>>>> I really fail to see how this could be considered an improvement for
>>>> s390. Especially I do not like that the (central) range check is now
>>>> moved to the caller (__segment_load). Which would mean potential
>>>> additional future callers would have to duplicate that code as well.
>>>
>>> The physical range check is being moved to the generic hotplug code
>>> via arch_get_mappable_range() instead, making the existing check in
>>> vmem_add_mapping() redundant. Dropping the check there necessitates
>>> adding back a similar check in __segment_load(). Otherwise there
>>> will be a loss of functionality in terms of range check.
>>>
>>> May be we could just keep this existing check in vmem_add_mapping()
>>> as well in order avoid this movement but then it would be redundant
>>> check in every hotplug path.
>>>
>>> So I guess the choice is to either have redundant range checks in
>>> all hotplug paths or future internal callers of vmem_add_mapping()
>>> take care of the range check.
>>
>> The problem I have with this current approach from an architecture
>> perspective: we end up having two completely different methods which
>> are doing the same and must be kept in sync. This might be obvious
>> looking at this patch, but I'm sure this will go out-of-sync (aka
>> broken) sooner or later.
> 
> Exactly, there should be one function only that was the whole idea of
> arch_get_mappable_range().
> 
>>
>> Therefore I would really like to see a single method to do the range
>> checking. Maybe you could add a callback into architecture code, so
>> that such an architecture specific function could also be used
>> elsewhere. Dunno.
>>
> 
> I think we can just switch to using "memhp_range_allowed()" here then
> after implementing arch_get_mappable_range().
> 
> Doesn't hurt to double check in vmem_add_mapping() - especially to keep
> extmem working without changes. At least for callers of memory hotplug
> it's then clear which values actually won't fail deep down in arch code.

But there is a small problem here. memhp_range_allowed() is now defined
and available with CONFIG_MEMORY_HOTPLUG where as vmem_add_mapping() and
__segment_load() are generally available without any config dependency.
So if CONFIG_MEMORY_HOTPLUG is not enabled there will be a build failure
in vmem_add_mapping() for memhp_range_allowed() symbol.

We could just move VM_BUG_ON(!memhp_range_allowed(start, size, 1)) check
from vmem_add_mapping() to arch_add_memory() like on arm64 platform. But
then __segment_load() would need that additional new check to compensate
as proposed earlier.

Also leaving vmem_add_mapping() and __segment_load() unchanged will cause
the address range check to be called three times on the hotplug path i.e

1. register_memory_resource()
2. arch_add_memory()
3. vmem_add_mapping()

Moving memhp_range_allowed() check inside arch_add_memory() seems better
and consistent with arm64. Also in the future, any platform which choose
to override arch_get_mappable() will have this additional VM_BUG_ON() in
their arch_add_memory().

David Hildenbrand Dec. 7, 2020, 9:03 a.m. UTC | #6

On 07.12.20 05:38, Anshuman Khandual wrote:
> 
> 
> On 12/3/20 5:31 PM, David Hildenbrand wrote:
>> On 03.12.20 12:51, Heiko Carstens wrote:
>>> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>>>> --- a/arch/s390/mm/extmem.c
>>>>>> +++ b/arch/s390/mm/extmem.c
>>>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>>>  		goto out_free_resource;
>>>>>>  	}
>>>>>>  
>>>>>> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>>>> +		rc = -ERANGE;
>>>>>> +		goto out_resource;
>>>>>> +	}
>>>>>> +
>>>>>>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>>>  	if (rc)
>>>>>>  		goto out_resource;
>>>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>>>> --- a/arch/s390/mm/vmem.c
>>>>>> +++ b/arch/s390/mm/vmem.c
>>>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>>>  	mutex_unlock(&vmem_mutex);
>>>>>>  }
>>>>>>  
>>>>>> +struct range arch_get_mappable_range(void)
>>>>>> +{
>>>>>> +	struct range memhp_range;
>>>>>> +
>>>>>> +	memhp_range.start = 0;
>>>>>> +	memhp_range.end =  VMEM_MAX_PHYS;
>>>>>> +	return memhp_range;
>>>>>> +}
>>>>>> +
>>>>>>  int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>>>  {
>>>>>>  	int ret;
>>>>>>  
>>>>>> -	if (start + size > VMEM_MAX_PHYS ||
>>>>>> -	    start + size < start)
>>>>>> -		return -ERANGE;
>>>>>> -
>>>>>
>>>>> I really fail to see how this could be considered an improvement for
>>>>> s390. Especially I do not like that the (central) range check is now
>>>>> moved to the caller (__segment_load). Which would mean potential
>>>>> additional future callers would have to duplicate that code as well.
>>>>
>>>> The physical range check is being moved to the generic hotplug code
>>>> via arch_get_mappable_range() instead, making the existing check in
>>>> vmem_add_mapping() redundant. Dropping the check there necessitates
>>>> adding back a similar check in __segment_load(). Otherwise there
>>>> will be a loss of functionality in terms of range check.
>>>>
>>>> May be we could just keep this existing check in vmem_add_mapping()
>>>> as well in order avoid this movement but then it would be redundant
>>>> check in every hotplug path.
>>>>
>>>> So I guess the choice is to either have redundant range checks in
>>>> all hotplug paths or future internal callers of vmem_add_mapping()
>>>> take care of the range check.
>>>
>>> The problem I have with this current approach from an architecture
>>> perspective: we end up having two completely different methods which
>>> are doing the same and must be kept in sync. This might be obvious
>>> looking at this patch, but I'm sure this will go out-of-sync (aka
>>> broken) sooner or later.
>>
>> Exactly, there should be one function only that was the whole idea of
>> arch_get_mappable_range().
>>
>>>
>>> Therefore I would really like to see a single method to do the range
>>> checking. Maybe you could add a callback into architecture code, so
>>> that such an architecture specific function could also be used
>>> elsewhere. Dunno.
>>>
>>
>> I think we can just switch to using "memhp_range_allowed()" here then
>> after implementing arch_get_mappable_range().
>>
>> Doesn't hurt to double check in vmem_add_mapping() - especially to keep
>> extmem working without changes. At least for callers of memory hotplug
>> it's then clear which values actually won't fail deep down in arch code.
> 
> But there is a small problem here. memhp_range_allowed() is now defined
> and available with CONFIG_MEMORY_HOTPLUG where as vmem_add_mapping() and
> __segment_load() are generally available without any config dependency.
> So if CONFIG_MEMORY_HOTPLUG is not enabled there will be a build failure
> in vmem_add_mapping() for memhp_range_allowed() symbol.
> 
> We could just move VM_BUG_ON(!memhp_range_allowed(start, size, 1)) check
> from vmem_add_mapping() to arch_add_memory() like on arm64 platform. But
> then __segment_load() would need that additional new check to compensate
> as proposed earlier.
> 
> Also leaving vmem_add_mapping() and __segment_load() unchanged will cause
> the address range check to be called three times on the hotplug path i.e
> 
> 1. register_memory_resource()
> 2. arch_add_memory()
> 3. vmem_add_mapping()
> 
> Moving memhp_range_allowed() check inside arch_add_memory() seems better
> and consistent with arm64. Also in the future, any platform which choose
> to override arch_get_mappable() will have this additional VM_BUG_ON() in
> their arch_add_memory().

Yeah, it might not make sense to add these checks all over the place.
The important part is that

1. There is a check somewhere (and if it's deep down in arch code)
2. There is an obvious way for callers to find out what valid values are.


I guess it would be good enough to

a) Factor out getting arch ranges into arch_get_mappable_range()
b) Provide memhp_get_pluggable_range()

Both changes only make sense with an in-tree user. I'm planning on using
this functionality in virtio-mem code. I can pickup your patches, drop
the superfluous checks, and use it from virtio-mem code. Makese sense
(BTW, looks like we'll see aarch64 support for virtio-mem soon)?

Anshuman Khandual Dec. 8, 2020, 5:32 a.m. UTC | #7

On 12/7/20 2:33 PM, David Hildenbrand wrote:
> On 07.12.20 05:38, Anshuman Khandual wrote:
>>
>>
>> On 12/3/20 5:31 PM, David Hildenbrand wrote:
>>> On 03.12.20 12:51, Heiko Carstens wrote:
>>>> On Thu, Dec 03, 2020 at 06:03:00AM +0530, Anshuman Khandual wrote:
>>>>>>> diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
>>>>>>> index 5060956b8e7d..cc055a78f7b6 100644
>>>>>>> --- a/arch/s390/mm/extmem.c
>>>>>>> +++ b/arch/s390/mm/extmem.c
>>>>>>> @@ -337,6 +337,11 @@ __segment_load (char *name, int do_nonshared, unsigned long *addr, unsigned long
>>>>>>>  		goto out_free_resource;
>>>>>>>  	}
>>>>>>>  
>>>>>>> +	if (seg->end + 1 > VMEM_MAX_PHYS || seg->end + 1 < seg->start_addr) {
>>>>>>> +		rc = -ERANGE;
>>>>>>> +		goto out_resource;
>>>>>>> +	}
>>>>>>> +
>>>>>>>  	rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
>>>>>>>  	if (rc)
>>>>>>>  		goto out_resource;
>>>>>>> diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
>>>>>>> index b239f2ba93b0..06dddcc0ce06 100644
>>>>>>> --- a/arch/s390/mm/vmem.c
>>>>>>> +++ b/arch/s390/mm/vmem.c
>>>>>>> @@ -532,14 +532,19 @@ void vmem_remove_mapping(unsigned long start, unsigned long size)
>>>>>>>  	mutex_unlock(&vmem_mutex);
>>>>>>>  }
>>>>>>>  
>>>>>>> +struct range arch_get_mappable_range(void)
>>>>>>> +{
>>>>>>> +	struct range memhp_range;
>>>>>>> +
>>>>>>> +	memhp_range.start = 0;
>>>>>>> +	memhp_range.end =  VMEM_MAX_PHYS;
>>>>>>> +	return memhp_range;
>>>>>>> +}
>>>>>>> +
>>>>>>>  int vmem_add_mapping(unsigned long start, unsigned long size)
>>>>>>>  {
>>>>>>>  	int ret;
>>>>>>>  
>>>>>>> -	if (start + size > VMEM_MAX_PHYS ||
>>>>>>> -	    start + size < start)
>>>>>>> -		return -ERANGE;
>>>>>>> -
>>>>>>
>>>>>> I really fail to see how this could be considered an improvement for
>>>>>> s390. Especially I do not like that the (central) range check is now
>>>>>> moved to the caller (__segment_load). Which would mean potential
>>>>>> additional future callers would have to duplicate that code as well.
>>>>>
>>>>> The physical range check is being moved to the generic hotplug code
>>>>> via arch_get_mappable_range() instead, making the existing check in
>>>>> vmem_add_mapping() redundant. Dropping the check there necessitates
>>>>> adding back a similar check in __segment_load(). Otherwise there
>>>>> will be a loss of functionality in terms of range check.
>>>>>
>>>>> May be we could just keep this existing check in vmem_add_mapping()
>>>>> as well in order avoid this movement but then it would be redundant
>>>>> check in every hotplug path.
>>>>>
>>>>> So I guess the choice is to either have redundant range checks in
>>>>> all hotplug paths or future internal callers of vmem_add_mapping()
>>>>> take care of the range check.
>>>>
>>>> The problem I have with this current approach from an architecture
>>>> perspective: we end up having two completely different methods which
>>>> are doing the same and must be kept in sync. This might be obvious
>>>> looking at this patch, but I'm sure this will go out-of-sync (aka
>>>> broken) sooner or later.
>>>
>>> Exactly, there should be one function only that was the whole idea of
>>> arch_get_mappable_range().
>>>
>>>>
>>>> Therefore I would really like to see a single method to do the range
>>>> checking. Maybe you could add a callback into architecture code, so
>>>> that such an architecture specific function could also be used
>>>> elsewhere. Dunno.
>>>>
>>>
>>> I think we can just switch to using "memhp_range_allowed()" here then
>>> after implementing arch_get_mappable_range().
>>>
>>> Doesn't hurt to double check in vmem_add_mapping() - especially to keep
>>> extmem working without changes. At least for callers of memory hotplug
>>> it's then clear which values actually won't fail deep down in arch code.
>>
>> But there is a small problem here. memhp_range_allowed() is now defined
>> and available with CONFIG_MEMORY_HOTPLUG where as vmem_add_mapping() and
>> __segment_load() are generally available without any config dependency.
>> So if CONFIG_MEMORY_HOTPLUG is not enabled there will be a build failure
>> in vmem_add_mapping() for memhp_range_allowed() symbol.
>>
>> We could just move VM_BUG_ON(!memhp_range_allowed(start, size, 1)) check
>> from vmem_add_mapping() to arch_add_memory() like on arm64 platform. But
>> then __segment_load() would need that additional new check to compensate
>> as proposed earlier.
>>
>> Also leaving vmem_add_mapping() and __segment_load() unchanged will cause
>> the address range check to be called three times on the hotplug path i.e
>>
>> 1. register_memory_resource()
>> 2. arch_add_memory()
>> 3. vmem_add_mapping()
>>
>> Moving memhp_range_allowed() check inside arch_add_memory() seems better
>> and consistent with arm64. Also in the future, any platform which choose
>> to override arch_get_mappable() will have this additional VM_BUG_ON() in
>> their arch_add_memory().
> 
> Yeah, it might not make sense to add these checks all over the place.
> The important part is that
> 
> 1. There is a check somewhere (and if it's deep down in arch code)
> 2. There is an obvious way for callers to find out what valid values are.
> 
> 
> I guess it would be good enough to
> 
> a) Factor out getting arch ranges into arch_get_mappable_range()
> b) Provide memhp_get_pluggable_range()

Have posted V1 earlier in the day which hopefully accommodates all previous
suggestions but otherwise do let me know if anything else still needs to be
improved upon.

https://lore.kernel.org/linux-mm/1607400978-31595-1-git-send-email-anshuman.khandual@arm.com/

> 
> Both changes only make sense with an in-tree user. I'm planning on using
> this functionality in virtio-mem code. I can pickup your patches, drop
> the superfluous checks, and use it from virtio-mem code. Makese sense
> (BTW, looks like we'll see aarch64 support for virtio-mem soon)?

I have not been following virtio-mem closely. But is there something pending
on arm64 platform which prevents virtio-mem enablement ?

David Hildenbrand Dec. 8, 2020, 8:38 a.m. UTC | #8

>>
>> Both changes only make sense with an in-tree user. I'm planning on using
>> this functionality in virtio-mem code. I can pickup your patches, drop
>> the superfluous checks, and use it from virtio-mem code. Makese sense
>> (BTW, looks like we'll see aarch64 support for virtio-mem soon)?
> 
> I have not been following virtio-mem closely. But is there something pending
> on arm64 platform which prevents virtio-mem enablement ?

Regarding enablement, I expect things to be working out of the box
mostly. Jonathan is currently doing some testing and wants to send a
simple unlock patch once done. [1]


Now, there are some things to improve in the future. virtio-mem
adds/removes individual Linux memory blocks and logically plugs/unplugs
MAX_ORDER - 1/pageblock_order pages inside Linux memory blocks.

1. memblock

On arm64 and powerpc, we create/delete memblocks when adding/removing
memory, which is suboptimal (and the code is quite fragile as we don't
handle errors ...). Hotplugged memory never has holes, so we can tweak
relevant code to not check via the memblock api.

For example, pfn_valid() only has to check for memblock_is_map_memory()
in case of !early_section() - otherwise it can just fallback to our
generic pfn_valid() function.

2. MAX_ORDER - 1 / pageblock_order

With 64k base pages, virtio-mem can only logically plug/unplug in 512MB
granularity, which is sub-optimal and inflexible. 4/2MB would be much
better - however this would require always using 2MB THP on arm64 (IIRC
via "cont" bits). Luckily, only some distributions use 64k base pages as
default nowadays ... :)

3. Section size

virtio-mem benefits from small section sizes. Currently, we have 1G.
With 4k base pages we could easily reduce it to something what x86 has
(128 MB) - and I remember discussions regarding that already in other
(IIRC NVDIMM / DIMM) context. Again, with 64k base pages we cannot go
below 512 MB right now.

[1] https://lkml.kernel.org/r/20201125145659.00004b3e@Huawei.com

[RFC,V2,3/3] s390/mm: Define arch_get_mappable_range()

Commit Message

Comments

Patch