Message ID | 20220421151755.GA4718@DESKTOP-NK4TH6S.localdomain (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xen/arm: p2m_set_entry duplicate calculation. | expand |
On Fri, 22 Apr 2022, Paran Lee wrote: > It doesn't seem necessary to do that calculation of order shift again. > > Signed-off-by: Paran Lee <p4ranlee@gmail.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > --- > xen/arch/arm/p2m.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 1d1059f7d2..533afc830a 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m, > while ( nr ) > { > unsigned long mask; > - unsigned long order; > + unsigned long order, pages; > > /* > * Don't take into account the MFN when removing mapping (i.e > @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m, > if ( rc ) > break; > > - sgfn = gfn_add(sgfn, (1 << order)); > + pages = 1 << order; > + sgfn = gfn_add(sgfn, pages); > if ( !mfn_eq(smfn, INVALID_MFN) ) > - smfn = mfn_add(smfn, (1 << order)); > + smfn = mfn_add(smfn, pages); > > - nr -= (1 << order); > + nr -= pages; > } > > return rc; > -- > 2.25.1 >
Hi, On 21/04/2022 16:17, Paran Lee wrote: > It doesn't seem necessary to do that calculation of order shift again. I think we need to weight that against increasing the number of local variables that do pretty much the same. This is pretty much done to a matter of taste here. IMHO, the original version is better but I see Stefano reviewed it so I will not argue against it. That said, given you already sent a few patches, can you explain why you are doing this? Is this optimization purpose? Is it clean-up? > > Signed-off-by: Paran Lee <p4ranlee@gmail.com> > --- > xen/arch/arm/p2m.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index 1d1059f7d2..533afc830a 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m, > while ( nr ) > { > unsigned long mask; > - unsigned long order; > + unsigned long order, pages; > > /* > * Don't take into account the MFN when removing mapping (i.e > @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m, > if ( rc ) > break; > > - sgfn = gfn_add(sgfn, (1 << order)); > + pages = 1 << order; Please take the opportunity to switch to 1UL. > + sgfn = gfn_add(sgfn, pages); > if ( !mfn_eq(smfn, INVALID_MFN) ) > - smfn = mfn_add(smfn, (1 << order)); > + smfn = mfn_add(smfn, pages); > > - nr -= (1 << order); > + nr -= pages; > } > > return rc; Cheers,
Hello, Julien Grall. Thanks you, I agreed! It made me think once more about what my patch could improve. patches I sent have been reviewed in various ways. It was a good opportunity to analyze my patch from various perspectives. :) I checked objdump in -O2 optimization(default) of Xen Makefile to make sure CSE (Common subexpression elimination) works well on the latest arm64 cross compiler on x86_64 from Arm GNU Toolchain. $ ~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -v ... A-profile Architecture 10.3-2021.07 (arm-10.29)' Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile Architecture 10.3-2021.07 (arm-10.29) I compared the before and after my patches. This time, without adding a "pages" variable, I proceeded to use the local variable mask with order operation. I was able to confirm that it does one less operation. (1) before clean up 0000000000001bb4 <p2m_set_entry>: while ( nr ) 1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc> { ... if ( rc ) 1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4> sgfn = gfn_add(sgfn, (1 << order)); 1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works 1c24: 93407e73 sxtw x19, w19 // <<< well! return _gfn(gfn_x(gfn) + i); 1c28: 8b1302d6 add x22, x22, x19 return _mfn(mfn_x(mfn) + i); 1c2c: 8b130281 add x1, x20, x19 1c30: b100069f cmn x20, #0x1 1c34: 9a941034 csel x20, x1, x20, ne // ne = any while ( nr ) 1c38: eb1302b5 subs x21, x21, x19 1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none (2) Using again mask variable. mask = 1UL << order code show me sxtw x19, w19 operation disappeared. 0000000000001bb4 <p2m_set_entry>: while ( nr ) 1bb4: b40005c2 cbz x2, 1c6c <p2m_set_entry+0xb8> { ... if ( rc ) 1c1c: 350002c0 cbnz w0, 1c74 <p2m_set_entry+0xc0> mask = 1UL << order; 1c20: 9ad32373 lsl x19, x27, x19 // <<< only lsl return _gfn(gfn_x(gfn) + i); 1c24: 8b1302d6 add x22, x22, x19 return _mfn(mfn_x(mfn) + i); 1c28: 8b130281 add x1, x20, x19 1c2c: b100069f cmn x20, #0x1 1c30: 9a941034 csel x20, x1, x20, ne // ne = any while ( nr ) 1c34: eb1302b5 subs x21, x21, x19 1c38: 540001e0 b.eq 1c74 <p2m_set_entry+0xc0> // b.none I'll send you a follow-up patch I've been working on. BR Paran Lee 2022-04-25 오전 1:17에 Julien Grall 이(가) 쓴 글: > Hi, > > On 21/04/2022 16:17, Paran Lee wrote: >> It doesn't seem necessary to do that calculation of order shift again. > > I think we need to weight that against increasing the number of local > variables that do pretty much the same. > > This is pretty much done to a matter of taste here. IMHO, the original > version is better but I see Stefano reviewed it so I will not argue > against it. > > That said, given you already sent a few patches, can you explain why you > are doing this? Is this optimization purpose? Is it clean-up? > >> >> Signed-off-by: Paran Lee <p4ranlee@gmail.com> >> --- >> xen/arch/arm/p2m.c | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c >> index 1d1059f7d2..533afc830a 100644 >> --- a/xen/arch/arm/p2m.c >> +++ b/xen/arch/arm/p2m.c >> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m, >> while ( nr ) >> { >> unsigned long mask; >> - unsigned long order; >> + unsigned long order, pages; >> /* >> * Don't take into account the MFN when removing mapping (i.e >> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m, >> if ( rc ) >> break; >> - sgfn = gfn_add(sgfn, (1 << order)); >> + pages = 1 << order; > > Please take the opportunity to switch to 1UL. > >> + sgfn = gfn_add(sgfn, pages); >> if ( !mfn_eq(smfn, INVALID_MFN) ) >> - smfn = mfn_add(smfn, (1 << order)); >> + smfn = mfn_add(smfn, pages); >> - nr -= (1 << order); >> + nr -= pages; >> } >> return rc; > > Cheers, >
Hi, On 26/04/2022 16:37, Paran Lee wrote: > Thanks you, I agreed! It made me think once more about what my patch > could improve. > patches I sent have been reviewed in various ways. It was a good > opportunity to analyze my patch from various perspectives. :) > > I checked objdump in -O2 optimization(default) of Xen Makefile to make > sure CSE (Common subexpression elimination) works well on the latest > arm64 cross compiler on x86_64 from Arm GNU Toolchain. > > $ > ~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc > -v > ... > A-profile Architecture 10.3-2021.07 (arm-10.29)' > Thread model: posix > Supported LTO compression algorithms: zlib > gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile > Architecture 10.3-2021.07 (arm-10.29) > > I compared the before and after my patches. This time, without adding a > "pages" variable, I proceeded to use the local variable mask with order > operation. > > I was able to confirm that it does one less operation. Well... I don't think the one less operation is because of introduction of the local variable (see more below). > > (1) before clean up > > 0000000000001bb4 <p2m_set_entry>: > while ( nr ) > 1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc> > { > ... > if ( rc ) > 1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4> > sgfn = gfn_add(sgfn, (1 << order)); 1 << order is a 32-bit value but the second parameter is a 64-bit value (assuming arm64). So... > 1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works > 1c24: 93407e73 sxtw x19, w19 // <<< well! ... this instruction is extending the 32-bit value to 64-bit value. > return _gfn(gfn_x(gfn) + i); > 1c28: 8b1302d6 add x22, x22, x19 > return _mfn(mfn_x(mfn) + i); > 1c2c: 8b130281 add x1, x20, x19 > 1c30: b100069f cmn x20, #0x1 > 1c34: 9a941034 csel x20, x1, x20, ne // ne = any > while ( nr ) > 1c38: eb1302b5 subs x21, x21, x19 > 1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none > > (2) Using again mask variable. mask = 1UL << order > code show me sxtw x19, w19 operation disappeared. This code is not only using a local variable but also using "1UL". So, I suspect that if you were using 1 << order, the instruction would re-appear. Cheers,
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 1d1059f7d2..533afc830a 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m, while ( nr ) { unsigned long mask; - unsigned long order; + unsigned long order, pages; /* * Don't take into account the MFN when removing mapping (i.e @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m, if ( rc ) break; - sgfn = gfn_add(sgfn, (1 << order)); + pages = 1 << order; + sgfn = gfn_add(sgfn, pages); if ( !mfn_eq(smfn, INVALID_MFN) ) - smfn = mfn_add(smfn, (1 << order)); + smfn = mfn_add(smfn, pages); - nr -= (1 << order); + nr -= pages; } return rc;
It doesn't seem necessary to do that calculation of order shift again. Signed-off-by: Paran Lee <p4ranlee@gmail.com> --- xen/arch/arm/p2m.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)