diff mbox series

[v6,07/26] mm: Preserve the PG_arch_* flags in __split_huge_page_tail()

Message ID 20200703153718.16973-8-catalin.marinas@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: Memory Tagging Extension user-space support | expand

Commit Message

Catalin Marinas July 3, 2020, 3:36 p.m. UTC
When a huge page is split into normal pages, part of the head page flags
are transferred to the tail pages. However, the PG_arch_* flags are not
part of the preserved set.

PG_arch_1 is currently used by the arch code to handle cache maintenance
for user space (either for I-D cache coherency or for D-cache aliases
consistent with the kernel mapping). Since splitting a huge page does
not change the physical or virtual address of a mapping, additional
cache maintenance for the tail pages is unnecessary. Preserving the
PG_arch_1 flag from the head page in the tail pages would not break the
current use-cases.

PG_arch_2 is currently used for arm64 MTE support to mark pages that
have valid tags. The absence of such flag causes the arm64 set_pte_at()
to clear the tags in order to avoid stale tags exposed to user or the
swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
page splitting leads to tag corruption in the tail pages.

To avoid the above and for consistency between the two PG_arch_* flags,
preserve both PG_arch_1 and PG_arch_2 in __split_huge_page_tail().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---

Notes:
    New in v6.

 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

David Hildenbrand July 6, 2020, 2:16 p.m. UTC | #1
On 03.07.20 17:36, Catalin Marinas wrote:
> When a huge page is split into normal pages, part of the head page flags
> are transferred to the tail pages. However, the PG_arch_* flags are not
> part of the preserved set.
> 
> PG_arch_1 is currently used by the arch code to handle cache maintenance
> for user space (either for I-D cache coherency or for D-cache aliases
> consistent with the kernel mapping). Since splitting a huge page does
> not change the physical or virtual address of a mapping, additional
> cache maintenance for the tail pages is unnecessary. Preserving the
> PG_arch_1 flag from the head page in the tail pages would not break the
> current use-cases.

^ is fairly arm64 specific, no? (I remember that the semantics are
different e.g., on s390x).

Did you check if this is actually safe to do on other architectures?
Maybe rephrase the description to make this clearer.

> 
> PG_arch_2 is currently used for arm64 MTE support to mark pages that
> have valid tags. The absence of such flag causes the arm64 set_pte_at()
> to clear the tags in order to avoid stale tags exposed to user or the
> swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
> page splitting leads to tag corruption in the tail pages.

"currently"? I don't think so - isn't it follow-up patches in this series?

> 
> To avoid the above and for consistency between the two PG_arch_* flags,
> preserve both PG_arch_1 and PG_arch_2 in __split_huge_page_tail().
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
> Notes:
>     New in v6.
> 
>  mm/huge_memory.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 78c84bee7e29..22b3236a6dd8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2364,6 +2364,10 @@ static void __split_huge_page_tail(struct page *head, int tail,
>  			 (1L << PG_workingset) |
>  			 (1L << PG_locked) |
>  			 (1L << PG_unevictable) |
> +			 (1L << PG_arch_1) |
> +#ifdef CONFIG_64BIT
> +			 (1L << PG_arch_2) |
> +#endif
>  			 (1L << PG_dirty)));
>  
>  	/* ->mapping in first tail page is compound_mapcount */
>
Catalin Marinas July 6, 2020, 4:30 p.m. UTC | #2
On Mon, Jul 06, 2020 at 04:16:13PM +0200, David Hildenbrand wrote:
> On 03.07.20 17:36, Catalin Marinas wrote:
> > When a huge page is split into normal pages, part of the head page flags
> > are transferred to the tail pages. However, the PG_arch_* flags are not
> > part of the preserved set.
> > 
> > PG_arch_1 is currently used by the arch code to handle cache maintenance
> > for user space (either for I-D cache coherency or for D-cache aliases
> > consistent with the kernel mapping). Since splitting a huge page does
> > not change the physical or virtual address of a mapping, additional
> > cache maintenance for the tail pages is unnecessary. Preserving the
> > PG_arch_1 flag from the head page in the tail pages would not break the
> > current use-cases.
> 
> ^ is fairly arm64 specific, no? (I remember that the semantics are
> different e.g., on s390x).

Not entirely arm64 specific. Apart from s390 and x86, I think all the
other architectures use this flag for cache maintenance (I guess they
followed the cachetlb.rst suggestion). My understanding of the s390 and
x86 is that transferring this flag from the head of a compound page to
the tail pages should not cause any issue. We don't even document
anywhere that this flag is meant to disappear on huge page splitting. I
guess no-one noticed because clearing it is relatively benign.

But if there are concerns, I'm happy to guard it with something like
__ARCH_WANT_PG_ARCH_HEAD_TAIL (I need to think of a more suggestive
name).

> > have valid tags. The absence of such flag causes the arm64 set_pte_at()
> > to clear the tags in order to avoid stale tags exposed to user or the
> > swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
> > page splitting leads to tag corruption in the tail pages.
> 
> "currently"? I don't think so - isn't it follow-up patches in this series?

True. It used to be correct before reordering the patches prior to
posting.
David Hildenbrand July 6, 2020, 5:56 p.m. UTC | #3
On 06.07.20 18:30, Catalin Marinas wrote:
> On Mon, Jul 06, 2020 at 04:16:13PM +0200, David Hildenbrand wrote:
>> On 03.07.20 17:36, Catalin Marinas wrote:
>>> When a huge page is split into normal pages, part of the head page flags
>>> are transferred to the tail pages. However, the PG_arch_* flags are not
>>> part of the preserved set.
>>>
>>> PG_arch_1 is currently used by the arch code to handle cache maintenance
>>> for user space (either for I-D cache coherency or for D-cache aliases
>>> consistent with the kernel mapping). Since splitting a huge page does
>>> not change the physical or virtual address of a mapping, additional
>>> cache maintenance for the tail pages is unnecessary. Preserving the
>>> PG_arch_1 flag from the head page in the tail pages would not break the
>>> current use-cases.
>>
>> ^ is fairly arm64 specific, no? (I remember that the semantics are
>> different e.g., on s390x).
> 
> Not entirely arm64 specific. Apart from s390 and x86, I think all the
> other architectures use this flag for cache maintenance (I guess they
> followed the cachetlb.rst suggestion). My understanding of the s390 and
> x86 is that transferring this flag from the head of a compound page to
> the tail pages should not cause any issue. We don't even document
> anywhere that this flag is meant to disappear on huge page splitting. I
> guess no-one noticed because clearing it is relatively benign.

On s390x, PG_arch_1 indicates (s390/kernel/uv.c:arch_make_page_accessible())
- kernel page tables
- for hugetlbfs pages, that storage keys are initialized for that page
  (IIRC KVM only)
- a user space page might be encrypted/secure (KVM only)

The latter does not support hugetlbfs/THP. KVM does not support THP. So
on s390x the bit should never be set in that context and, therefore,
also won't be affected by this change.

> 
> But if there are concerns, I'm happy to guard it with something like
> __ARCH_WANT_PG_ARCH_HEAD_TAIL (I need to think of a more suggestive
> name).

I guess we can avoid that if we properly check+document all users.
(ignoring x86 and s390x behavior here might be dangerous, although my
gut feeling is that it's ok for both)
Catalin Marinas July 8, 2020, 12:17 p.m. UTC | #4
On Mon, Jul 06, 2020 at 07:56:43PM +0200, David Hildenbrand wrote:
> On 06.07.20 18:30, Catalin Marinas wrote:
> > On Mon, Jul 06, 2020 at 04:16:13PM +0200, David Hildenbrand wrote:
> >> On 03.07.20 17:36, Catalin Marinas wrote:
> >>> When a huge page is split into normal pages, part of the head page flags
> >>> are transferred to the tail pages. However, the PG_arch_* flags are not
> >>> part of the preserved set.
> >>>
> >>> PG_arch_1 is currently used by the arch code to handle cache maintenance
> >>> for user space (either for I-D cache coherency or for D-cache aliases
> >>> consistent with the kernel mapping). Since splitting a huge page does
> >>> not change the physical or virtual address of a mapping, additional
> >>> cache maintenance for the tail pages is unnecessary. Preserving the
> >>> PG_arch_1 flag from the head page in the tail pages would not break the
> >>> current use-cases.
> >>
> >> ^ is fairly arm64 specific, no? (I remember that the semantics are
> >> different e.g., on s390x).
> > 
> > Not entirely arm64 specific. Apart from s390 and x86, I think all the
> > other architectures use this flag for cache maintenance (I guess they
> > followed the cachetlb.rst suggestion). My understanding of the s390 and
> > x86 is that transferring this flag from the head of a compound page to
> > the tail pages should not cause any issue. We don't even document
> > anywhere that this flag is meant to disappear on huge page splitting. I
> > guess no-one noticed because clearing it is relatively benign.
> 
> On s390x, PG_arch_1 indicates (s390/kernel/uv.c:arch_make_page_accessible())
> - kernel page tables
> - for hugetlbfs pages, that storage keys are initialized for that page
>   (IIRC KVM only)
> - a user space page might be encrypted/secure (KVM only)
> 
> The latter does not support hugetlbfs/THP. KVM does not support THP. So
> on s390x the bit should never be set in that context and, therefore,
> also won't be affected by this change.

Thanks for checking.

> > But if there are concerns, I'm happy to guard it with something like
> > __ARCH_WANT_PG_ARCH_HEAD_TAIL (I need to think of a more suggestive
> > name).
> 
> I guess we can avoid that if we properly check+document all users.
> (ignoring x86 and s390x behavior here might be dangerous, although my
> gut feeling is that it's ok for both)

I'll post an independent patch for PG_arch_1 to get consensus among
architectures. The PG_arch_2 introduced by the MTE patches can have the
new behaviour since it would only be used by arm64 initially.
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78c84bee7e29..22b3236a6dd8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2364,6 +2364,10 @@  static void __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_workingset) |
 			 (1L << PG_locked) |
 			 (1L << PG_unevictable) |
+			 (1L << PG_arch_1) |
+#ifdef CONFIG_64BIT
+			 (1L << PG_arch_2) |
+#endif
 			 (1L << PG_dirty)));
 
 	/* ->mapping in first tail page is compound_mapcount */