diff mbox series

[V3,1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

Message ID 1614921898-4099-2-git-send-email-anshuman.khandual@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory | expand

Commit Message

Anshuman Khandual March 5, 2021, 5:24 a.m. UTC
pfn_valid() validates a pfn but basically it checks for a valid struct page
backing for that pfn. It should always return positive for memory ranges
backed with struct page mapping. But currently pfn_valid() fails for all
ZONE_DEVICE based memory types even though they have struct page mapping.

pfn_valid() asserts that there is a memblock entry for a given pfn without
MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
that they do not have memblock entries. Hence memblock_is_map_memory() will
invariably fail via memblock_search() for a ZONE_DEVICE based address. This
eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
into the system via memremap_pages() called from a driver, their respective
memory sections will not have SECTION_IS_EARLY set.

Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
for firmware reserved memory regions. memblock_is_map_memory() can just be
skipped as its always going to be positive and that will be an optimization
for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
hotplugged memory too will not have SECTION_IS_EARLY set for their sections

Skipping memblock_is_map_memory() for all non early memory sections would
fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
performance for normal hotplug memory as well.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Acked-by: David Hildenbrand <david@redhat.com>
Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 arch/arm64/mm/init.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Catalin Marinas March 5, 2021, 6:13 p.m. UTC | #1
On Fri, Mar 05, 2021 at 10:54:57AM +0530, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
> hotplugged memory too will not have SECTION_IS_EARLY set for their sections
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Acked-by: David Hildenbrand <david@redhat.com>
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/mm/init.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 0ace5e68efba..5920c527845a 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -230,6 +230,18 @@ int pfn_valid(unsigned long pfn)
>  
>  	if (!valid_section(__pfn_to_section(pfn)))
>  		return 0;
> +
> +	/*
> +	 * ZONE_DEVICE memory does not have the memblock entries.
> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * addresses will always fail. Even the normal hotplugged
> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> +	 * memblock entries. Skip memblock search for all non early
> +	 * memory sections covering all of hotplug memory including
> +	 * both normal and ZONE_DEVICE based.
> +	 */
> +	if (!early_section(__pfn_to_section(pfn)))
> +		return pfn_section_valid(__pfn_to_section(pfn), pfn);

Would something like this work instead:

	if (online_device_section(ms))
		return 1;

to avoid the assumptions around early_section()?
David Hildenbrand March 5, 2021, 6:24 p.m. UTC | #2
On 05.03.21 19:13, Catalin Marinas wrote:
> On Fri, Mar 05, 2021 at 10:54:57AM +0530, Anshuman Khandual wrote:
>> pfn_valid() validates a pfn but basically it checks for a valid struct page
>> backing for that pfn. It should always return positive for memory ranges
>> backed with struct page mapping. But currently pfn_valid() fails for all
>> ZONE_DEVICE based memory types even though they have struct page mapping.
>>
>> pfn_valid() asserts that there is a memblock entry for a given pfn without
>> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
>> that they do not have memblock entries. Hence memblock_is_map_memory() will
>> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
>> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
>> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
>> into the system via memremap_pages() called from a driver, their respective
>> memory sections will not have SECTION_IS_EARLY set.
>>
>> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
>> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
>> for firmware reserved memory regions. memblock_is_map_memory() can just be
>> skipped as its always going to be positive and that will be an optimization
>> for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
>> hotplugged memory too will not have SECTION_IS_EARLY set for their sections
>>
>> Skipping memblock_is_map_memory() for all non early memory sections would
>> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
>> performance for normal hotplug memory as well.
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Will Deacon <will@kernel.org>
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: Robin Murphy <robin.murphy@arm.com>
>> Cc: linux-arm-kernel@lists.infradead.org
>> Cc: linux-kernel@vger.kernel.org
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
>> ---
>>   arch/arm64/mm/init.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 0ace5e68efba..5920c527845a 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -230,6 +230,18 @@ int pfn_valid(unsigned long pfn)
>>   
>>   	if (!valid_section(__pfn_to_section(pfn)))
>>   		return 0;
>> +
>> +	/*
>> +	 * ZONE_DEVICE memory does not have the memblock entries.
>> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
>> +	 * addresses will always fail. Even the normal hotplugged
>> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
>> +	 * memblock entries. Skip memblock search for all non early
>> +	 * memory sections covering all of hotplug memory including
>> +	 * both normal and ZONE_DEVICE based.
>> +	 */
>> +	if (!early_section(__pfn_to_section(pfn)))
>> +		return pfn_section_valid(__pfn_to_section(pfn), pfn);
> 
> Would something like this work instead:
> 
> 	if (online_device_section(ms))
> 		return 1;
> 
> to avoid the assumptions around early_section()?
> 

Please keep online section logic out of pfn valid logic. Tow different 
things. (and rather not diverge too much from generic pfn_valid() - we 
want to achieve the opposite in the long term, merging both implementations)
Catalin Marinas March 8, 2021, 11:29 a.m. UTC | #3
On Fri, Mar 05, 2021 at 07:24:21PM +0100, David Hildenbrand wrote:
> On 05.03.21 19:13, Catalin Marinas wrote:
> > On Fri, Mar 05, 2021 at 10:54:57AM +0530, Anshuman Khandual wrote:
> > > pfn_valid() validates a pfn but basically it checks for a valid struct page
> > > backing for that pfn. It should always return positive for memory ranges
> > > backed with struct page mapping. But currently pfn_valid() fails for all
> > > ZONE_DEVICE based memory types even though they have struct page mapping.
> > > 
> > > pfn_valid() asserts that there is a memblock entry for a given pfn without
> > > MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> > > that they do not have memblock entries. Hence memblock_is_map_memory() will
> > > invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> > > eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> > > to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> > > into the system via memremap_pages() called from a driver, their respective
> > > memory sections will not have SECTION_IS_EARLY set.
> > > 
> > > Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> > > regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> > > for firmware reserved memory regions. memblock_is_map_memory() can just be
> > > skipped as its always going to be positive and that will be an optimization
> > > for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
> > > hotplugged memory too will not have SECTION_IS_EARLY set for their sections
> > > 
> > > Skipping memblock_is_map_memory() for all non early memory sections would
> > > fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> > > performance for normal hotplug memory as well.
> > > 
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Ard Biesheuvel <ardb@kernel.org>
> > > Cc: Robin Murphy <robin.murphy@arm.com>
> > > Cc: linux-arm-kernel@lists.infradead.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Acked-by: David Hildenbrand <david@redhat.com>
> > > Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> > > Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> > > ---
> > >   arch/arm64/mm/init.c | 12 ++++++++++++
> > >   1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > index 0ace5e68efba..5920c527845a 100644
> > > --- a/arch/arm64/mm/init.c
> > > +++ b/arch/arm64/mm/init.c
> > > @@ -230,6 +230,18 @@ int pfn_valid(unsigned long pfn)
> > >   	if (!valid_section(__pfn_to_section(pfn)))
> > >   		return 0;
> > > +
> > > +	/*
> > > +	 * ZONE_DEVICE memory does not have the memblock entries.
> > > +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> > > +	 * addresses will always fail. Even the normal hotplugged
> > > +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> > > +	 * memblock entries. Skip memblock search for all non early
> > > +	 * memory sections covering all of hotplug memory including
> > > +	 * both normal and ZONE_DEVICE based.
> > > +	 */
> > > +	if (!early_section(__pfn_to_section(pfn)))
> > > +		return pfn_section_valid(__pfn_to_section(pfn), pfn);
> > 
> > Would something like this work instead:
> > 
> > 	if (online_device_section(ms))
> > 		return 1;
> > 
> > to avoid the assumptions around early_section()?
> 
> Please keep online section logic out of pfn valid logic. Tow different
> things. (and rather not diverge too much from generic pfn_valid() - we want
> to achieve the opposite in the long term, merging both implementations)

I think I misread the code. I was looking for a new flag to check like
SECTION_IS_DEVICE instead of assuming that !SECTION_IS_EARLY means
device or mhp.

Anyway, staring at this code for a bit more, I came to the conclusion
that the logic in Anshuman's patches is fairly robust - we only need to
check for memblock_is_map_memory() if early_section() as that's the only
case where we can have MEMBLOCK_NOMAP. Maybe the comment above should be
re-written a bit and avoid the ZONE_DEVICE and hotplugged memory
details altogether.
Catalin Marinas March 8, 2021, 5:59 p.m. UTC | #4
On Fri, Mar 05, 2021 at 10:54:57AM +0530, Anshuman Khandual wrote:
> pfn_valid() validates a pfn but basically it checks for a valid struct page
> backing for that pfn. It should always return positive for memory ranges
> backed with struct page mapping. But currently pfn_valid() fails for all
> ZONE_DEVICE based memory types even though they have struct page mapping.
> 
> pfn_valid() asserts that there is a memblock entry for a given pfn without
> MEMBLOCK_NOMAP flag being set. The problem with ZONE_DEVICE based memory is
> that they do not have memblock entries. Hence memblock_is_map_memory() will
> invariably fail via memblock_search() for a ZONE_DEVICE based address. This
> eventually fails pfn_valid() which is wrong. memblock_is_map_memory() needs
> to be skipped for such memory ranges. As ZONE_DEVICE memory gets hotplugged
> into the system via memremap_pages() called from a driver, their respective
> memory sections will not have SECTION_IS_EARLY set.
> 
> Normal hotplug memory will never have MEMBLOCK_NOMAP set in their memblock
> regions. Because the flag MEMBLOCK_NOMAP was specifically designed and set
> for firmware reserved memory regions. memblock_is_map_memory() can just be
> skipped as its always going to be positive and that will be an optimization
> for the normal hotplug memory. Like ZONE_DEVICE based memory, all normal
> hotplugged memory too will not have SECTION_IS_EARLY set for their sections
> 
> Skipping memblock_is_map_memory() for all non early memory sections would
> fix pfn_valid() problem for ZONE_DEVICE based memory and also improve its
> performance for normal hotplug memory as well.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> Acked-by: David Hildenbrand <david@redhat.com>
> Fixes: 73b20c84d42d ("arm64: mm: implement pte_devmap support")
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> ---
>  arch/arm64/mm/init.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 0ace5e68efba..5920c527845a 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -230,6 +230,18 @@ int pfn_valid(unsigned long pfn)
>  
>  	if (!valid_section(__pfn_to_section(pfn)))
>  		return 0;
> +
> +	/*
> +	 * ZONE_DEVICE memory does not have the memblock entries.
> +	 * memblock_is_map_memory() check for ZONE_DEVICE based
> +	 * addresses will always fail. Even the normal hotplugged
> +	 * memory will never have MEMBLOCK_NOMAP flag set in their
> +	 * memblock entries. Skip memblock search for all non early
> +	 * memory sections covering all of hotplug memory including
> +	 * both normal and ZONE_DEVICE based.
> +	 */
> +	if (!early_section(__pfn_to_section(pfn)))
> +		return pfn_section_valid(__pfn_to_section(pfn), pfn);
>  #endif
>  	return memblock_is_map_memory(addr);
>  }

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
diff mbox series

Patch

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0ace5e68efba..5920c527845a 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -230,6 +230,18 @@  int pfn_valid(unsigned long pfn)
 
 	if (!valid_section(__pfn_to_section(pfn)))
 		return 0;
+
+	/*
+	 * ZONE_DEVICE memory does not have the memblock entries.
+	 * memblock_is_map_memory() check for ZONE_DEVICE based
+	 * addresses will always fail. Even the normal hotplugged
+	 * memory will never have MEMBLOCK_NOMAP flag set in their
+	 * memblock entries. Skip memblock search for all non early
+	 * memory sections covering all of hotplug memory including
+	 * both normal and ZONE_DEVICE based.
+	 */
+	if (!early_section(__pfn_to_section(pfn)))
+		return pfn_section_valid(__pfn_to_section(pfn), pfn);
 #endif
 	return memblock_is_map_memory(addr);
 }