diff mbox series

[v5] arch/ia64: Restore arch-specific pgd_offset_k implementation

Message ID 20200813205521.5405-1-rppt@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v5] arch/ia64: Restore arch-specific pgd_offset_k implementation | expand

Commit Message

Mike Rapoport Aug. 13, 2020, 8:55 p.m. UTC
From: Jessica Clarke <jrtc27@jrtc27.com>

IA-64 is special and treats pgd_offset_k() differently to pgd_offset(),
using different formulae to calculate the indices into the kernel and user
PGDs.  The index into the user PGDs takes into account the region number,
but the index into the kernel (init_mm) PGD always assumes a predefined
kernel region number. Commit 974b9b2c68 ("mm: consolidate pte_index() and
pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which
incorrectly used pgd_index() for kernel page tables.  As a result, the
index into the kernel PGD was going out of bounds and the kernel hung
during early boot.

Allow overrides of pgd_offset_k() and override it on IA-64 with the old
implementation that will correctly index the kernel PGD.

Fixes: 974b9b2c68 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
Changes in v4:
* Correct the comment that I missed in v4 :(

Changes since v3:
* Correct commit message as Jessica suggested

Changes since v2:
* Rephrase commit message and comment about pgd_offset_k()

Changes since v1:
 * Fixed typo in commit message
 * Slightly reworded commit message to sound less weird
 * Included Adrian's Tested-by
 arch/ia64/include/asm/pgtable.h | 9 +++++++++
 include/linux/pgtable.h         | 2 ++
 2 files changed, 11 insertions(+)

Comments

Matthew Wilcox Aug. 13, 2020, 9:36 p.m. UTC | #1
On Thu, Aug 13, 2020 at 11:55:21PM +0300, Mike Rapoport wrote:
> +/*
> + * In the kernel's mapped region we know everything is in region number 5, so
> + * as an optimisation its PGD already points to the area for that region.

Is it actually an optimisation?  Are there any benchmarks where this
makes any difference whatsoever?  Or should the comment be closer to

/* ia64 is gratuitously different.  Cope with it. */

I might suggest that pgd be changed to be the same as every other pgd
in the system, and every other architecture.
Jessica Clarke Aug. 13, 2020, 9:44 p.m. UTC | #2
On 13 Aug 2020, at 22:36, Matthew Wilcox <willy@infradead.org> wrote:
> 
> On Thu, Aug 13, 2020 at 11:55:21PM +0300, Mike Rapoport wrote:
>> +/*
>> + * In the kernel's mapped region we know everything is in region number 5, so
>> + * as an optimisation its PGD already points to the area for that region.
> 
> Is it actually an optimisation?  Are there any benchmarks where this
> makes any difference whatsoever?  Or should the comment be closer to
> 
> /* ia64 is gratuitously different.  Cope with it. */
> 
> I might suggest that pgd be changed to be the same as every other pgd
> in the system, and every other architecture.

IA-64 is just weird. There's an optional hardware page table walker,
and two different options for how to use it within that, short-format
and long-format. My understanding is that Linux uses the short-format,
and that means that the VHPTs (Virtual Hashed Page Table) are
per-region. My assumption therefore is that the kernel's PGD is a
completely separate allocation from the userspace ones, and it's not
just a case of "moving the pointer back to the start" but that "that is
all there is", so unifying the two would be a significant undertaking
that avoids one special case at the cost of invasive refactoring and a
performance hit by duplication and/or unused space. My knowledge of
IA-64 translation is pretty limited, that's about all I know (and this
likely has mistakes in it), but I know enough to avoid finding out too
much more. Never make the mistake of assuming IA-64 can be made to look
like any other architecture :)

Jess
Mike Rapoport Aug. 13, 2020, 10:21 p.m. UTC | #3
On Thu, Aug 13, 2020 at 10:36:31PM +0100, Matthew Wilcox wrote:
> On Thu, Aug 13, 2020 at 11:55:21PM +0300, Mike Rapoport wrote:
> > +/*
> > + * In the kernel's mapped region we know everything is in region number 5, so
> > + * as an optimisation its PGD already points to the area for that region.
> 
> Is it actually an optimisation?  Are there any benchmarks where this
> makes any difference whatsoever?  Or should the comment be closer to
> 
> /* ia64 is gratuitously different.  Cope with it. */

It better suits arch/ia64/Kconfig ;-)

diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 5b4ec80bf586..205cdb91a6d5 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -62,6 +62,7 @@ config IA64
          the 32-bit X86 line.  The IA-64 Linux project has a home
          page at <http://www.linuxia64.org/> and a mailing list at
          <linux-ia64@vger.kernel.org>.
+         IA-64 is gratuitously different. Cope with it.

> I might suggest that pgd be changed to be the same as every other pgd
> in the system, and every other architecture.
 
I did some grepping and got lost in IA-64 assembly, so I agree with
Jessica that it would open a can of worms.
Mike Rapoport Aug. 17, 2020, 7:33 a.m. UTC | #4
On Thu, Aug 13, 2020 at 11:55:21PM +0300, Mike Rapoport wrote:
> From: Jessica Clarke <jrtc27@jrtc27.com>
> 
> IA-64 is special and treats pgd_offset_k() differently to pgd_offset(),
> using different formulae to calculate the indices into the kernel and user
> PGDs.  The index into the user PGDs takes into account the region number,
> but the index into the kernel (init_mm) PGD always assumes a predefined
> kernel region number. Commit 974b9b2c68 ("mm: consolidate pte_index() and
> pte_offset_*() definitions") made IA-64 use a generic pgd_offset_k() which
> incorrectly used pgd_index() for kernel page tables.  As a result, the
> index into the kernel PGD was going out of bounds and the kernel hung
> during early boot.
> 
> Allow overrides of pgd_offset_k() and override it on IA-64 with the old
> implementation that will correctly index the kernel PGD.
> 
> Fixes: 974b9b2c68 ("mm: consolidate pte_index() and pte_offset_*() definitions")
> Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com>
> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
> Changes in v4:
> * Correct the comment that I missed in v4 :(
> 
> Changes since v3:
> * Correct commit message as Jessica suggested
> 
> Changes since v2:
> * Rephrase commit message and comment about pgd_offset_k()
> 
> Changes since v1:
>  * Fixed typo in commit message
>  * Slightly reworded commit message to sound less weird
>  * Included Adrian's Tested-by
>  arch/ia64/include/asm/pgtable.h | 9 +++++++++
>  include/linux/pgtable.h         | 2 ++
>  2 files changed, 11 insertions(+)

I can take this via my tree if there are no objections.

Tony?
Luck, Tony Aug. 17, 2020, 4:05 p.m. UTC | #5
> I can take this via my tree if there are no objections.

Sure. Go ahead. (You can count that as Acked-by)

-Tony
diff mbox series

Patch

diff --git a/arch/ia64/include/asm/pgtable.h b/arch/ia64/include/asm/pgtable.h
index 10850897a91c..779b6972aa84 100644
--- a/arch/ia64/include/asm/pgtable.h
+++ b/arch/ia64/include/asm/pgtable.h
@@ -366,6 +366,15 @@  pgd_index (unsigned long address)
 }
 #define pgd_index pgd_index
 
+/*
+ * In the kernel's mapped region we know everything is in region number 5, so
+ * as an optimisation its PGD already points to the area for that region.
+ * However, this also means that we cannot use pgd_index() and we must
+ * never add the region here.
+ */
+#define pgd_offset_k(addr) \
+	(init_mm.pgd + (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)))
+
 /* Look up a pgd entry in the gate area.  On IA-64, the gate-area
    resides in the kernel-mapped segment, hence we use pgd_offset_k()
    here.  */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index a124c21e3204..e8cbc2e795d5 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -117,7 +117,9 @@  static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
  * a shortcut which implies the use of the kernel's pgd, instead
  * of a process's
  */
+#ifndef pgd_offset_k
 #define pgd_offset_k(address)		pgd_offset(&init_mm, (address))
+#endif
 
 /*
  * In many cases it is known that a virtual address is mapped at PMD or PTE