diff mbox

x86/mm: Reduce debug overhead of __virt_to_maddr()

Message ID 1502891924-20563-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Andrew Cooper Aug. 16, 2017, 1:58 p.m. UTC
__virt_to_maddr() is used very frequently, but has a large footprint due to
its assertions and comparasons.

Rearange its logic to drop one assertion entirely, encoding its check in a
second assertion (with no additional branch, and the comparason performed with
a 32bit immediate rather than requiring a movabs).

Bloat-o-meter net report is:
  add/remove: 0/0 grow/shrink: 1/72 up/down: 3/-2169 (-2166)

along with a reduction of 32 assertion frames (895 down to 861)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 xen/include/asm-x86/x86_64/page.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Jan Beulich Aug. 16, 2017, 2:11 p.m. UTC | #1
>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/include/asm-x86/x86_64/page.h
> +++ b/xen/include/asm-x86/x86_64/page.h
> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>  
>  static inline unsigned long __virt_to_maddr(unsigned long va)
>  {
> -    ASSERT(va >= XEN_VIRT_START);
>      ASSERT(va < DIRECTMAP_VIRT_END);
>      if ( va >= DIRECTMAP_VIRT_START )
>          va -= DIRECTMAP_VIRT_START;
>      else
>      {
> -        ASSERT(va < XEN_VIRT_END);
> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));

Do you really need the casts here? I.e. what's wrong here with
doing unsigned long arithmetic?

Jan
Andrew Cooper Aug. 16, 2017, 2:14 p.m. UTC | #2
On 16/08/17 15:11, Jan Beulich wrote:
>>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/include/asm-x86/x86_64/page.h
>> +++ b/xen/include/asm-x86/x86_64/page.h
>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>  
>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>  {
>> -    ASSERT(va >= XEN_VIRT_START);
>>      ASSERT(va < DIRECTMAP_VIRT_END);
>>      if ( va >= DIRECTMAP_VIRT_START )
>>          va -= DIRECTMAP_VIRT_START;
>>      else
>>      {
>> -        ASSERT(va < XEN_VIRT_END);
>> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
> Do you really need the casts here? I.e. what's wrong here with
> doing unsigned long arithmetic?

Oh - good point.  This took more than one attempt to get right, and I
first thought I had a sign extension problem.  The actual problem was a
(lack of) + PAGE_SHIFT.

The other thing to know is that  __virt_to_maddr() is used before the
IDT is set up, so your only signal of something being wrong is a triple
fault.  Let me double check without the casts, but I think it should be
fine.

~Andrew
Jan Beulich Aug. 16, 2017, 2:19 p.m. UTC | #3
>>> On 16.08.17 at 16:14, <andrew.cooper3@citrix.com> wrote:
> On 16/08/17 15:11, Jan Beulich wrote:
>>>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/include/asm-x86/x86_64/page.h
>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>  
>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>  {
>>> -    ASSERT(va >= XEN_VIRT_START);
>>>      ASSERT(va < DIRECTMAP_VIRT_END);
>>>      if ( va >= DIRECTMAP_VIRT_START )
>>>          va -= DIRECTMAP_VIRT_START;
>>>      else
>>>      {
>>> -        ASSERT(va < XEN_VIRT_END);
>>> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>> Do you really need the casts here? I.e. what's wrong here with
>> doing unsigned long arithmetic?
> 
> Oh - good point.  This took more than one attempt to get right, and I
> first thought I had a sign extension problem.  The actual problem was a
> (lack of) + PAGE_SHIFT.
> 
> The other thing to know is that  __virt_to_maddr() is used before the
> IDT is set up, so your only signal of something being wrong is a triple
> fault.  Let me double check without the casts, but I think it should be
> fine.

If it is, then with the casts dropped
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Andrew Cooper Aug. 16, 2017, 2:22 p.m. UTC | #4
On 16/08/17 15:14, Andrew Cooper wrote:
> On 16/08/17 15:11, Jan Beulich wrote:
>>>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/include/asm-x86/x86_64/page.h
>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>  
>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>  {
>>> -    ASSERT(va >= XEN_VIRT_START);
>>>      ASSERT(va < DIRECTMAP_VIRT_END);
>>>      if ( va >= DIRECTMAP_VIRT_START )
>>>          va -= DIRECTMAP_VIRT_START;
>>>      else
>>>      {
>>> -        ASSERT(va < XEN_VIRT_END);
>>> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>> Do you really need the casts here? I.e. what's wrong here with
>> doing unsigned long arithmetic?
> Oh - good point.  This took more than one attempt to get right, and I
> first thought I had a sign extension problem.  The actual problem was a
> (lack of) + PAGE_SHIFT.
>
> The other thing to know is that  __virt_to_maddr() is used before the
> IDT is set up, so your only signal of something being wrong is a triple
> fault.  Let me double check without the casts, but I think it should be
> fine.

Ok - so it does function when using unsigned arithmetic.

However, the generated code is better with signed arithmetic, as
((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
whereas XEN_VIRT_START >> 39 needs a movabs.

On the whole, I'd prefer to keep patch as is.

~Andrew
Jan Beulich Aug. 16, 2017, 3:07 p.m. UTC | #5
>>> On 16.08.17 at 16:22, <andrew.cooper3@citrix.com> wrote:
> On 16/08/17 15:14, Andrew Cooper wrote:
>> On 16/08/17 15:11, Jan Beulich wrote:
>>>>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/include/asm-x86/x86_64/page.h
>>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>>  
>>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>>  {
>>>> -    ASSERT(va >= XEN_VIRT_START);
>>>>      ASSERT(va < DIRECTMAP_VIRT_END);
>>>>      if ( va >= DIRECTMAP_VIRT_START )
>>>>          va -= DIRECTMAP_VIRT_START;
>>>>      else
>>>>      {
>>>> -        ASSERT(va < XEN_VIRT_END);
>>>> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>>> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>>> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>>> Do you really need the casts here? I.e. what's wrong here with
>>> doing unsigned long arithmetic?
>> Oh - good point.  This took more than one attempt to get right, and I
>> first thought I had a sign extension problem.  The actual problem was a
>> (lack of) + PAGE_SHIFT.
>>
>> The other thing to know is that  __virt_to_maddr() is used before the
>> IDT is set up, so your only signal of something being wrong is a triple
>> fault.  Let me double check without the casts, but I think it should be
>> fine.
> 
> Ok - so it does function when using unsigned arithmetic.
> 
> However, the generated code is better with signed arithmetic, as
> ((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
> whereas XEN_VIRT_START >> 39 needs a movabs.

Why would that be? Shifting out 39 bits means 25 significant bits
are left out of the original 64. Or wait - isn't it 30 rather than 39?
In that case indeed 34 significant bits would remain. In that case
I'd be fine with the casts left in place, as long as at least the
commit message (a code comment may be better to keep people
like me from being tempted to remove the casts as ugly and
apparently unnecessary) says why.

Jan
Andrew Cooper Aug. 16, 2017, 4:17 p.m. UTC | #6
On 16/08/17 16:07, Jan Beulich wrote:
>>>> On 16.08.17 at 16:22, <andrew.cooper3@citrix.com> wrote:
>> On 16/08/17 15:14, Andrew Cooper wrote:
>>> On 16/08/17 15:11, Jan Beulich wrote:
>>>>>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
>>>>> --- a/xen/include/asm-x86/x86_64/page.h
>>>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>>>  
>>>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>>>  {
>>>>> -    ASSERT(va >= XEN_VIRT_START);
>>>>>      ASSERT(va < DIRECTMAP_VIRT_END);
>>>>>      if ( va >= DIRECTMAP_VIRT_START )
>>>>>          va -= DIRECTMAP_VIRT_START;
>>>>>      else
>>>>>      {
>>>>> -        ASSERT(va < XEN_VIRT_END);
>>>>> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>>>> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>>>> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>>>> Do you really need the casts here? I.e. what's wrong here with
>>>> doing unsigned long arithmetic?
>>> Oh - good point.  This took more than one attempt to get right, and I
>>> first thought I had a sign extension problem.  The actual problem was a
>>> (lack of) + PAGE_SHIFT.
>>>
>>> The other thing to know is that  __virt_to_maddr() is used before the
>>> IDT is set up, so your only signal of something being wrong is a triple
>>> fault.  Let me double check without the casts, but I think it should be
>>> fine.
>> Ok - so it does function when using unsigned arithmetic.
>>
>> However, the generated code is better with signed arithmetic, as
>> ((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
>> whereas XEN_VIRT_START >> 39 needs a movabs.
> Why would that be? Shifting out 39 bits means 25 significant bits
> are left out of the original 64. Or wait - isn't it 30 rather than 39?
> In that case indeed 34 significant bits would remain. In that case
> I'd be fine with the casts left in place, as long as at least the
> commit message (a code comment may be better to keep people
> like me from being tempted to remove the casts as ugly and
> apparently unnecessary) says why.

I'm clearly doing very well at counting today.  I do mean 30 bits (order
18 + page shift of 12).

The generated code is this:

ffff82d0802ff923:       48 89 c2                mov    %rax,%rdx
ffff82d0802ff926:       48 c1 fa 1e             sar    $0x1e,%rdx
ffff82d0802ff92a:       48 81 fa 42 0b fe ff    cmp   
$0xfffffffffffe0b42,%rdx

While there are 34 significant bits from this shift, the top 16 of them
are strictly set, meaning there are only 28 usefully significant bits.

FYI, the unsigned case looks like this:

ffff82d0802ffb12:       48 89 c1                mov    %rax,%rcx
ffff82d0802ffb15:       48 c1 e9 1e             shr    $0x1e,%rcx
ffff82d0802ffb19:       48 ba 42 0b fe ff 03    movabs $0x3fffe0b42,%rdx
ffff82d0802ffb20:       00 00 00
ffff82d0802ffb23:       48 39 d1                cmp    %rdx,%rcx

Are you happy with the following comment?

/* Signed arithmetic in so ((long)XEN_VIRT_START >> 30) fits in an imm32. */

~Andrew
Jan Beulich Aug. 17, 2017, 7:38 a.m. UTC | #7
>>> On 16.08.17 at 18:17, <andrew.cooper3@citrix.com> wrote:
> On 16/08/17 16:07, Jan Beulich wrote:
>>>>> On 16.08.17 at 16:22, <andrew.cooper3@citrix.com> wrote:
>>> On 16/08/17 15:14, Andrew Cooper wrote:
>>>> On 16/08/17 15:11, Jan Beulich wrote:
>>>>>>>> On 16.08.17 at 15:58, <andrew.cooper3@citrix.com> wrote:
>>>>>> --- a/xen/include/asm-x86/x86_64/page.h
>>>>>> +++ b/xen/include/asm-x86/x86_64/page.h
>>>>>> @@ -51,13 +51,15 @@ extern unsigned long xen_virt_end;
>>>>>>  
>>>>>>  static inline unsigned long __virt_to_maddr(unsigned long va)
>>>>>>  {
>>>>>> -    ASSERT(va >= XEN_VIRT_START);
>>>>>>      ASSERT(va < DIRECTMAP_VIRT_END);
>>>>>>      if ( va >= DIRECTMAP_VIRT_START )
>>>>>>          va -= DIRECTMAP_VIRT_START;
>>>>>>      else
>>>>>>      {
>>>>>> -        ASSERT(va < XEN_VIRT_END);
>>>>>> +        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
>>>>>> +        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
>>>>>> +               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
>>>>> Do you really need the casts here? I.e. what's wrong here with
>>>>> doing unsigned long arithmetic?
>>>> Oh - good point.  This took more than one attempt to get right, and I
>>>> first thought I had a sign extension problem.  The actual problem was a
>>>> (lack of) + PAGE_SHIFT.
>>>>
>>>> The other thing to know is that  __virt_to_maddr() is used before the
>>>> IDT is set up, so your only signal of something being wrong is a triple
>>>> fault.  Let me double check without the casts, but I think it should be
>>>> fine.
>>> Ok - so it does function when using unsigned arithmetic.
>>>
>>> However, the generated code is better with signed arithmetic, as
>>> ((long)XEN_VIRT_START >> 39) fix in a 32bit sign-extended immediate,
>>> whereas XEN_VIRT_START >> 39 needs a movabs.
>> Why would that be? Shifting out 39 bits means 25 significant bits
>> are left out of the original 64. Or wait - isn't it 30 rather than 39?
>> In that case indeed 34 significant bits would remain. In that case
>> I'd be fine with the casts left in place, as long as at least the
>> commit message (a code comment may be better to keep people
>> like me from being tempted to remove the casts as ugly and
>> apparently unnecessary) says why.
> 
> I'm clearly doing very well at counting today.  I do mean 30 bits (order
> 18 + page shift of 12).
> 
> The generated code is this:
> 
> ffff82d0802ff923:       48 89 c2                mov    %rax,%rdx
> ffff82d0802ff926:       48 c1 fa 1e             sar    $0x1e,%rdx
> ffff82d0802ff92a:       48 81 fa 42 0b fe ff    cmp   
> $0xfffffffffffe0b42,%rdx
> 
> While there are 34 significant bits from this shift, the top 16 of them
> are strictly set, meaning there are only 28 usefully significant bits.
> 
> FYI, the unsigned case looks like this:
> 
> ffff82d0802ffb12:       48 89 c1                mov    %rax,%rcx
> ffff82d0802ffb15:       48 c1 e9 1e             shr    $0x1e,%rcx
> ffff82d0802ffb19:       48 ba 42 0b fe ff 03    movabs $0x3fffe0b42,%rdx
> ffff82d0802ffb20:       00 00 00
> ffff82d0802ffb23:       48 39 d1                cmp    %rdx,%rcx
> 
> Are you happy with the following comment?
> 
> /* Signed arithmetic in so ((long)XEN_VIRT_START >> 30) fits in an imm32. */

Yes.

Jan
diff mbox

Patch

diff --git a/xen/include/asm-x86/x86_64/page.h b/xen/include/asm-x86/x86_64/page.h
index 947e52b..bd30f25 100644
--- a/xen/include/asm-x86/x86_64/page.h
+++ b/xen/include/asm-x86/x86_64/page.h
@@ -51,13 +51,15 @@  extern unsigned long xen_virt_end;
 
 static inline unsigned long __virt_to_maddr(unsigned long va)
 {
-    ASSERT(va >= XEN_VIRT_START);
     ASSERT(va < DIRECTMAP_VIRT_END);
     if ( va >= DIRECTMAP_VIRT_START )
         va -= DIRECTMAP_VIRT_START;
     else
     {
-        ASSERT(va < XEN_VIRT_END);
+        BUILD_BUG_ON(XEN_VIRT_END - XEN_VIRT_START != GB(1));
+        ASSERT(((long)va >> (PAGE_ORDER_1G + PAGE_SHIFT)) ==
+               ((long)XEN_VIRT_START >> (PAGE_ORDER_1G + PAGE_SHIFT)));
+
         va += xen_phys_start - XEN_VIRT_START;
     }
     return (va & ma_va_bottom_mask) |