diff mbox series

[v2] x86/PoD: move increment of entry count

Message ID 5b3f46f3-3c9f-57fb-00a5-94128f41e34a@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] x86/PoD: move increment of entry count | expand

Commit Message

Jan Beulich Jan. 4, 2022, 10:57 a.m. UTC
When not holding the PoD lock across the entire region covering P2M
update and stats update, the entry count should indicate too large a
value in preference to a too small one, to avoid functions bailing early
when they find the count is zero. Hence increments should happen ahead
of P2M updates, while decrements should happen only after. Deal with the
one place where this hasn't been the case yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add comments.
---
While it might be possible to hold the PoD lock over the entire
operation, I didn't want to chance introducing a lock order violation on
a perhaps rarely taken code path.

Comments

Jan Beulich Jan. 27, 2022, 3:04 p.m. UTC | #1
On 04.01.2022 11:57, Jan Beulich wrote:
> When not holding the PoD lock across the entire region covering P2M
> update and stats update, the entry count should indicate too large a
> value in preference to a too small one, to avoid functions bailing early
> when they find the count is zero. Hence increments should happen ahead
> of P2M updates, while decrements should happen only after. Deal with the
> one place where this hasn't been the case yet.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Add comments.

Ping?

Jan

> ---
> While it might be possible to hold the PoD lock over the entire
> operation, I didn't want to chance introducing a lock order violation on
> a perhaps rarely taken code path.
> 
> --- a/xen/arch/x86/mm/p2m-pod.c
> +++ b/xen/arch/x86/mm/p2m-pod.c
> @@ -1342,19 +1342,22 @@ mark_populate_on_demand(struct domain *d
>          }
>      }
>  
> +    /*
> +     * Without holding the PoD lock across the entire operation, bump the
> +     * entry count up front assuming success of p2m_set_entry(), undoing the
> +     * bump as necessary upon failure.  Bumping only upon success would risk
> +     * code elsewhere observing entry count being zero despite there actually
> +     * still being PoD entries.
> +     */
> +    pod_lock(p2m);
> +    p2m->pod.entry_count += (1UL << order) - pod_count;
> +    pod_unlock(p2m);
> +
>      /* Now, actually do the two-way mapping */
>      rc = p2m_set_entry(p2m, gfn, INVALID_MFN, order,
>                         p2m_populate_on_demand, p2m->default_access);
>      if ( rc == 0 )
> -    {
> -        pod_lock(p2m);
> -        p2m->pod.entry_count += 1UL << order;
> -        p2m->pod.entry_count -= pod_count;
> -        BUG_ON(p2m->pod.entry_count < 0);
> -        pod_unlock(p2m);
> -
>          ioreq_request_mapcache_invalidate(d);
> -    }
>      else if ( order )
>      {
>          /*
> @@ -1366,6 +1369,14 @@ mark_populate_on_demand(struct domain *d
>                 d, gfn_l, order, rc);
>          domain_crash(d);
>      }
> +    else if ( !pod_count )
> +    {
> +        /* Undo earlier increment; see comment above. */
> +        pod_lock(p2m);
> +        BUG_ON(!p2m->pod.entry_count);
> +        --p2m->pod.entry_count;
> +        pod_unlock(p2m);
> +    }
>  
>  out:
>      gfn_unlock(p2m, gfn, order);
> 
>
Jan Beulich Feb. 24, 2022, 1:04 p.m. UTC | #2
On 27.01.2022 16:04, Jan Beulich wrote:
> On 04.01.2022 11:57, Jan Beulich wrote:
>> When not holding the PoD lock across the entire region covering P2M
>> update and stats update, the entry count should indicate too large a
>> value in preference to a too small one, to avoid functions bailing early
>> when they find the count is zero. Hence increments should happen ahead
>> of P2M updates, while decrements should happen only after. Deal with the
>> one place where this hasn't been the case yet.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Add comments.
> 
> Ping?

Similarly here: Another 4 weeks have passed ...

Thanks for feedback either way, Jan

>> ---
>> While it might be possible to hold the PoD lock over the entire
>> operation, I didn't want to chance introducing a lock order violation on
>> a perhaps rarely taken code path.
>>
>> --- a/xen/arch/x86/mm/p2m-pod.c
>> +++ b/xen/arch/x86/mm/p2m-pod.c
>> @@ -1342,19 +1342,22 @@ mark_populate_on_demand(struct domain *d
>>          }
>>      }
>>  
>> +    /*
>> +     * Without holding the PoD lock across the entire operation, bump the
>> +     * entry count up front assuming success of p2m_set_entry(), undoing the
>> +     * bump as necessary upon failure.  Bumping only upon success would risk
>> +     * code elsewhere observing entry count being zero despite there actually
>> +     * still being PoD entries.
>> +     */
>> +    pod_lock(p2m);
>> +    p2m->pod.entry_count += (1UL << order) - pod_count;
>> +    pod_unlock(p2m);
>> +
>>      /* Now, actually do the two-way mapping */
>>      rc = p2m_set_entry(p2m, gfn, INVALID_MFN, order,
>>                         p2m_populate_on_demand, p2m->default_access);
>>      if ( rc == 0 )
>> -    {
>> -        pod_lock(p2m);
>> -        p2m->pod.entry_count += 1UL << order;
>> -        p2m->pod.entry_count -= pod_count;
>> -        BUG_ON(p2m->pod.entry_count < 0);
>> -        pod_unlock(p2m);
>> -
>>          ioreq_request_mapcache_invalidate(d);
>> -    }
>>      else if ( order )
>>      {
>>          /*
>> @@ -1366,6 +1369,14 @@ mark_populate_on_demand(struct domain *d
>>                 d, gfn_l, order, rc);
>>          domain_crash(d);
>>      }
>> +    else if ( !pod_count )
>> +    {
>> +        /* Undo earlier increment; see comment above. */
>> +        pod_lock(p2m);
>> +        BUG_ON(!p2m->pod.entry_count);
>> +        --p2m->pod.entry_count;
>> +        pod_unlock(p2m);
>> +    }
>>  
>>  out:
>>      gfn_unlock(p2m, gfn, order);
>>
>>
>
George Dunlap March 11, 2024, 4:04 p.m. UTC | #3
On Tue, Jan 4, 2022 at 10:58 AM Jan Beulich <jbeulich@suse.com> wrote:

> When not holding the PoD lock across the entire region covering P2M
> update and stats update, the entry count should indicate too large a
> value in preference to a too small one, to avoid functions bailing early
> when they find the count is zero. Hence increments should happen ahead
> of P2M updates, while decrements should happen only after. Deal with the
> one place where this hasn't been the case yet.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Add comments.
> ---
> While it might be possible to hold the PoD lock over the entire
> operation, I didn't want to chance introducing a lock order violation on
> a perhaps rarely taken code path.
>

[No idea how I missed this 2 years ago, sorry for the delay]

Actually I think just holding the lock is probably the right thing to do.
We already grab gfn_lock() over the entire operation, and p2m_set_entry()
ASSERTs gfn_locked_by_me(); and all we have to do to trigger the check is
boot any guest in PoD mode at all; surely we have osstest tests for that?

 -George
Jan Beulich March 12, 2024, 1:17 p.m. UTC | #4
On 11.03.2024 17:04, George Dunlap wrote:
> On Tue, Jan 4, 2022 at 10:58 AM Jan Beulich <jbeulich@suse.com> wrote:
> 
>> When not holding the PoD lock across the entire region covering P2M
>> update and stats update, the entry count should indicate too large a
>> value in preference to a too small one, to avoid functions bailing early
>> when they find the count is zero. Hence increments should happen ahead
>> of P2M updates, while decrements should happen only after. Deal with the
>> one place where this hasn't been the case yet.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Add comments.
>> ---
>> While it might be possible to hold the PoD lock over the entire
>> operation, I didn't want to chance introducing a lock order violation on
>> a perhaps rarely taken code path.
>>
> 
> [No idea how I missed this 2 years ago, sorry for the delay]
> 
> Actually I think just holding the lock is probably the right thing to do.
> We already grab gfn_lock() over the entire operation, and p2m_set_entry()
> ASSERTs gfn_locked_by_me(); and all we have to do to trigger the check is
> boot any guest in PoD mode at all; surely we have osstest tests for that?

The gfn (aka p2m) lock isn't of interest here. It's the locks whose lock
level is between p2m and pod which are. Then again there are obviously
other calls to p2m_set_entry() with the PoD lock held. So if there was a
problem (e.g. with ept_set_entry() calling p2m_altp2m_propagate_change()),
I wouldn't make things meaningfully worse by holding the PoD lock for
longer here. So yes, let me switch to that and then hope ...

Jan
diff mbox series

Patch

--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -1342,19 +1342,22 @@  mark_populate_on_demand(struct domain *d
         }
     }
 
+    /*
+     * Without holding the PoD lock across the entire operation, bump the
+     * entry count up front assuming success of p2m_set_entry(), undoing the
+     * bump as necessary upon failure.  Bumping only upon success would risk
+     * code elsewhere observing entry count being zero despite there actually
+     * still being PoD entries.
+     */
+    pod_lock(p2m);
+    p2m->pod.entry_count += (1UL << order) - pod_count;
+    pod_unlock(p2m);
+
     /* Now, actually do the two-way mapping */
     rc = p2m_set_entry(p2m, gfn, INVALID_MFN, order,
                        p2m_populate_on_demand, p2m->default_access);
     if ( rc == 0 )
-    {
-        pod_lock(p2m);
-        p2m->pod.entry_count += 1UL << order;
-        p2m->pod.entry_count -= pod_count;
-        BUG_ON(p2m->pod.entry_count < 0);
-        pod_unlock(p2m);
-
         ioreq_request_mapcache_invalidate(d);
-    }
     else if ( order )
     {
         /*
@@ -1366,6 +1369,14 @@  mark_populate_on_demand(struct domain *d
                d, gfn_l, order, rc);
         domain_crash(d);
     }
+    else if ( !pod_count )
+    {
+        /* Undo earlier increment; see comment above. */
+        pod_lock(p2m);
+        BUG_ON(!p2m->pod.entry_count);
+        --p2m->pod.entry_count;
+        pod_unlock(p2m);
+    }
 
 out:
     gfn_unlock(p2m, gfn, order);