diff mbox series

revert "x86/mm: re-implement get_page_light() using an atomic increment"

Message ID f54c05d4-b61d-4e26-8e93-6e1bdc22a1d4@suse.com (mailing list archive)
State New, archived
Headers show
Series revert "x86/mm: re-implement get_page_light() using an atomic increment" | expand

Commit Message

Jan Beulich April 29, 2024, 1:01 p.m. UTC
revert "x86/mm: re-implement get_page_light() using an atomic increment"

This reverts commit c40bc0576dcc5acd4d7e22ef628eb4642f568533.

That change aimed at eliminating an open-coded lock-like construct,
which really isn't all that similar to, in particular, get_page(). The
function always succeeds. Any remaining concern would want taking care
of by placing block_lock_speculation() at the end of the function.
Since the function is called only during page (de)validation, any
possible performance concerns over such extra serialization could
likely be addressed by pre-validating (e.g. via pinning) page tables.

The fundamental issue with the change being reverted is that it detects
bad state only after already having caused possible corruption. While
the system is going to be halted in such an event, there is a time
window during which the resulting incorrect state could be leveraged by
a clever (in particular: fast enough) attacker.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper April 29, 2024, 1:09 p.m. UTC | #1
On 29/04/2024 2:01 pm, Jan Beulich wrote:
> revert "x86/mm: re-implement get_page_light() using an atomic increment"
>
> This reverts commit c40bc0576dcc5acd4d7e22ef628eb4642f568533.
>
> That change aimed at eliminating an open-coded lock-like construct,
> which really isn't all that similar to, in particular, get_page(). The
> function always succeeds. Any remaining concern would want taking care
> of by placing block_lock_speculation() at the end of the function.
> Since the function is called only during page (de)validation, any
> possible performance concerns over such extra serialization could
> likely be addressed by pre-validating (e.g. via pinning) page tables.
>
> The fundamental issue with the change being reverted is that it detects
> bad state only after already having caused possible corruption. While
> the system is going to be halted in such an event, there is a time
> window during which the resulting incorrect state could be leveraged by
> a clever (in particular: fast enough) attacker.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné April 29, 2024, 2:49 p.m. UTC | #2
On Mon, Apr 29, 2024 at 03:01:00PM +0200, Jan Beulich wrote:
> revert "x86/mm: re-implement get_page_light() using an atomic increment"
> 
> This reverts commit c40bc0576dcc5acd4d7e22ef628eb4642f568533.
> 
> That change aimed at eliminating an open-coded lock-like construct,
> which really isn't all that similar to, in particular, get_page(). The
> function always succeeds. Any remaining concern would want taking care
> of by placing block_lock_speculation() at the end of the function.
> Since the function is called only during page (de)validation, any
> possible performance concerns over such extra serialization could
> likely be addressed by pre-validating (e.g. via pinning) page tables.
> 
> The fundamental issue with the change being reverted is that it detects
> bad state only after already having caused possible corruption. While
> the system is going to be halted in such an event, there is a time
> window during which the resulting incorrect state could be leveraged by
> a clever (in particular: fast enough) attacker.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.
diff mbox series

Patch

--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2582,10 +2582,16 @@  bool get_page(struct page_info *page, const struct domain *domain)
  */
 static void get_page_light(struct page_info *page)
 {
-    unsigned long old_pgc = arch_fetch_and_add(&page->count_info, 1);
+    unsigned long x, nx, y = page->count_info;
 
-    BUG_ON(!(old_pgc & PGC_count_mask)); /* Not allocated? */
-    BUG_ON(!((old_pgc + 1) & PGC_count_mask)); /* Overflow? */
+    do {
+        x  = y;
+        nx = x + 1;
+        BUG_ON(!(x & PGC_count_mask)); /* Not allocated? */
+        BUG_ON(!(nx & PGC_count_mask)); /* Overflow? */
+        y = cmpxchg(&page->count_info, x, nx);
+    }
+    while ( unlikely(y != x) );
 }
 
 static int validate_page(struct page_info *page, unsigned long type,