diff mbox series

[v2] riscv: optimize ELF relocation function in riscv

Message ID 1683881513-18730-1-git-send-email-lixiaoyun@binary-semi.com (mailing list archive)
State Superseded
Headers show
Series [v2] riscv: optimize ELF relocation function in riscv | expand

Checks

Context Check Description
conchuod/tree_selection fail Failed to apply to next/pending-fixes or riscv/for-next

Commit Message

Amma Lee May 12, 2023, 8:51 a.m. UTC
The patch can optimize the running times of "insmod" command by modify ELF
relocation function.
When install the riscv ELF driver which contains multiple
symbol table items to be relocated, kernel takes a lot of time to
execute the relocation. For example, we install a 3+MB driver need 180+s.
We focus on the riscv kernel handle R_RISCV_HI20 and R_RISCV_LO12 type
items relocation function and find that there are two for-loops in this
function. If we modify the begin number in the second for-loops iteration,
we could save significant time for installation. We install the 3+MB
driver could just need 2s.

Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
Reviewed-by: Conor Dooley <conor@kernel.org>

---
 arch/riscv/kernel/module.c | 53 ++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 18 deletions(-)

Comments

Conor Dooley May 12, 2023, 9:03 a.m. UTC | #1
Hey Amma,

On Fri, May 12, 2023 at 04:51:53PM +0800, Amma Lee wrote:
> The patch can optimize the running times of "insmod" command by modify ELF
> relocation function.
> When install the riscv ELF driver which contains multiple
> symbol table items to be relocated, kernel takes a lot of time to
> execute the relocation. For example, we install a 3+MB driver need 180+s.
> We focus on the riscv kernel handle R_RISCV_HI20 and R_RISCV_LO12 type
> items relocation function and find that there are two for-loops in this
> function. If we modify the begin number in the second for-loops iteration,
> we could save significant time for installation. We install the 3+MB
> driver could just need 2s.
> 
> Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>

> Reviewed-by: Conor Dooley <conor@kernel.org>

I did not provide this tag. Do not add tags that you were not explicitly
given by people.

Also, you didn't actually even implement all of the things I pointed out
either :(

> 
> ---
>  arch/riscv/kernel/module.c | 53 ++++++++++++++++++++++++++++++----------------
>  1 file changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> index 55507b0..1683c1d 100755
> --- a/arch/riscv/kernel/module.c
> +++ b/arch/riscv/kernel/module.c
> @@ -385,9 +385,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  
>  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
>  			unsigned int j;
> -			/*Modify the j for-loops begin number from last iterates end value*/
> +
> +			/*
> +			* In the second for-loops, each traversal for j is
                        ^^
Comment in passing here, we align the *s in comments.

> +			* starts from 0 to the symbol table item index which
> +			* is detected. By the tool "readelf", we find that all
> +			* the symbol table items about R_RISCV_PCREL_HI20 type
> +			* are incrementally added in order. It means that we
> +			* could interate the j with the previous loop end
> +			* value(j_idx) as the begin number in the next loop;
> +			*/
>  			for (j = j_idx; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> -			/* Modify end */
>  				unsigned long hi20_loc =
>  					sechdrs[sechdrs[relsec].sh_info].sh_addr
>  					+ rel[j].r_offset;
> @@ -420,22 +428,30 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  					break;
>  				}
>  			}
> +
>  			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> -				if(j_idx == 0){
> +				if (j_idx == 0) {
>  					pr_err(
>  						"%s: Can not find HI20 relocation information\n",
>  						me->name);
>  					return -EINVAL;
>  				}
> -				
> -				

^^ and this is one of the checkpatch complaints I mentioned last time,
no double blank lines please.

Also, this version of the patch doesn't actually apply for me to
v6.4-rc1 anymore, but the file has not been modified in upstream since
29/01. Not too sure what you have used as the base for this as a result
:/

Thanks,
Conor.
Andrew Jones May 12, 2023, 10:04 a.m. UTC | #2
On Fri, May 12, 2023 at 04:51:53PM +0800, Amma Lee wrote:
> The patch can optimize the running times of "insmod" command by modify ELF
> relocation function.
> When install the riscv ELF driver which contains multiple
> symbol table items to be relocated, kernel takes a lot of time to
> execute the relocation. For example, we install a 3+MB driver need 180+s.
> We focus on the riscv kernel handle R_RISCV_HI20 and R_RISCV_LO12 type
> items relocation function and find that there are two for-loops in this
> function. If we modify the begin number in the second for-loops iteration,
> we could save significant time for installation. We install the 3+MB
> driver could just need 2s.
> 
> Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> Reviewed-by: Conor Dooley <conor@kernel.org>
> 
> ---
>  arch/riscv/kernel/module.c | 53 ++++++++++++++++++++++++++++++----------------
>  1 file changed, 35 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> index 55507b0..1683c1d 100755
> --- a/arch/riscv/kernel/module.c
> +++ b/arch/riscv/kernel/module.c
> @@ -385,9 +385,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  
>  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
>  			unsigned int j;
> -			/*Modify the j for-loops begin number from last iterates end value*/
> +
> +			/*

whitespace issue here

> +			* In the second for-loops, each traversal for j is
> +			* starts from 0 to the symbol table item index which
> +			* is detected. By the tool "readelf", we find that all
> +			* the symbol table items about R_RISCV_PCREL_HI20 type
> +			* are incrementally added in order. It means that we
> +			* could interate the j with the previous loop end
> +			* value(j_idx) as the begin number in the next loop;
> +			*/
>  			for (j = j_idx; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> -			/* Modify end */
>  				unsigned long hi20_loc =
>  					sechdrs[sechdrs[relsec].sh_info].sh_addr
>  					+ rel[j].r_offset;
> @@ -420,22 +428,30 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  					break;
>  				}
>  			}
> +
>  			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> -				if(j_idx == 0){
> +				if (j_idx == 0) {
>  					pr_err(
>  						"%s: Can not find HI20 relocation information\n",
>  						me->name);
>  					return -EINVAL;
>  				}
> -				
> -				
> -				for (j = 0; j < j_idx; j++){ 
> +
> +				/*

also here

> +				* If the last j-loop have been traversed to the
> +				* maximum value but never match the
> +				* corresponding symbol relocation item, the
> +				* j-loop will execute the second loop which
> +				* is begin from 0 to the prerious index (j_idx)

previous

> +				* unless the previous j_idx == 0;
> +				*/
> +				for (j = 0; j < j_idx; j++) {
>  					unsigned long hi20_loc =
>  						sechdrs[sechdrs[relsec].sh_info].sh_addr
>  						+ rel[j].r_offset;
>  					u32 hi20_type = ELF_RISCV_R_TYPE(rel[j].r_info);
> -				
> -				
> +
> +

While fixing this whitespace, we could remove the redundant blank line too.

>  					/* Find the corresponding HI20 relocation entry */
>  					if (hi20_loc == sym->st_value
>  						&& (hi20_type == R_RISCV_PCREL_HI20
> @@ -447,36 +463,37 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
>  						unsigned long hi20_sym_val =
>  							hi20_sym->st_value
>  							+ rel[j].r_addend;
> -					
> -				
> +
> +

also here

>  						/* Calculate lo12 */
>  						size_t offset = hi20_sym_val - hi20_loc;
> +						/* Calculate lo12 */

stray copy+pasted line?

>  						if (IS_ENABLED(CONFIG_MODULE_SECTIONS)
>  							&& hi20_type == R_RISCV_GOT_HI20) {
>  							offset = module_emit_got_entry(
>  								me, hi20_sym_val);
>  							offset = offset - hi20_loc;
> -				
> +

need to just remove this blank line

>  						}
>  						hi20 = (offset + 0x800) & 0xfffff000;
>  						lo12 = offset - hi20;
>  						v = lo12;
> -					
> +

same here

>  						break;
>  					}
>  				}
> -				
> -				if (j == j_idx)
> -				{
> +
> +				if (j == j_idx) {
>  					pr_err(
>  						"%s: Can not find HI20 relocation information\n",
>  						me->name);
>  					return -EINVAL;
>  				}
> -				
> -				
> +
> +

should just remove both the blank lines above

>  			}
> -			
> +
> +			/* Record the previous j-loop end index */
>  			j_idx = j;

Huh... what did I miss? I went through the whole patch but only see
formatting changes (and more opportunity for formatting changes).
Where's the actual change?

Thanks,
drew
Andrew Jones May 12, 2023, 10:06 a.m. UTC | #3
On Fri, May 12, 2023 at 12:04:13PM +0200, Andrew Jones wrote:
> On Fri, May 12, 2023 at 04:51:53PM +0800, Amma Lee wrote:
> > The patch can optimize the running times of "insmod" command by modify ELF
> > relocation function.
> > When install the riscv ELF driver which contains multiple
> > symbol table items to be relocated, kernel takes a lot of time to
> > execute the relocation. For example, we install a 3+MB driver need 180+s.
> > We focus on the riscv kernel handle R_RISCV_HI20 and R_RISCV_LO12 type
> > items relocation function and find that there are two for-loops in this
> > function. If we modify the begin number in the second for-loops iteration,
> > we could save significant time for installation. We install the 3+MB
> > driver could just need 2s.
> > 
> > Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> > Reviewed-by: Conor Dooley <conor@kernel.org>
> > 
> > ---
> >  arch/riscv/kernel/module.c | 53 ++++++++++++++++++++++++++++++----------------
> >  1 file changed, 35 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> > index 55507b0..1683c1d 100755
> > --- a/arch/riscv/kernel/module.c
> > +++ b/arch/riscv/kernel/module.c
> > @@ -385,9 +385,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  
> >  		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> >  			unsigned int j;
> > -			/*Modify the j for-loops begin number from last iterates end value*/
> > +
> > +			/*
> 
> whitespace issue here
> 
> > +			* In the second for-loops, each traversal for j is
> > +			* starts from 0 to the symbol table item index which
> > +			* is detected. By the tool "readelf", we find that all
> > +			* the symbol table items about R_RISCV_PCREL_HI20 type
> > +			* are incrementally added in order. It means that we
> > +			* could interate the j with the previous loop end
> > +			* value(j_idx) as the begin number in the next loop;
> > +			*/
> >  			for (j = j_idx; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> > -			/* Modify end */
> >  				unsigned long hi20_loc =
> >  					sechdrs[sechdrs[relsec].sh_info].sh_addr
> >  					+ rel[j].r_offset;
> > @@ -420,22 +428,30 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  					break;
> >  				}
> >  			}
> > +
> >  			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> > -				if(j_idx == 0){
> > +				if (j_idx == 0) {
> >  					pr_err(
> >  						"%s: Can not find HI20 relocation information\n",
> >  						me->name);
> >  					return -EINVAL;
> >  				}
> > -				
> > -				
> > -				for (j = 0; j < j_idx; j++){ 
> > +
> > +				/*
> 
> also here
> 
> > +				* If the last j-loop have been traversed to the
> > +				* maximum value but never match the
> > +				* corresponding symbol relocation item, the
> > +				* j-loop will execute the second loop which
> > +				* is begin from 0 to the prerious index (j_idx)
> 
> previous
> 
> > +				* unless the previous j_idx == 0;
> > +				*/
> > +				for (j = 0; j < j_idx; j++) {
> >  					unsigned long hi20_loc =
> >  						sechdrs[sechdrs[relsec].sh_info].sh_addr
> >  						+ rel[j].r_offset;
> >  					u32 hi20_type = ELF_RISCV_R_TYPE(rel[j].r_info);
> > -				
> > -				
> > +
> > +
> 
> While fixing this whitespace, we could remove the redundant blank line too.
> 
> >  					/* Find the corresponding HI20 relocation entry */
> >  					if (hi20_loc == sym->st_value
> >  						&& (hi20_type == R_RISCV_PCREL_HI20
> > @@ -447,36 +463,37 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >  						unsigned long hi20_sym_val =
> >  							hi20_sym->st_value
> >  							+ rel[j].r_addend;
> > -					
> > -				
> > +
> > +
> 
> also here
> 
> >  						/* Calculate lo12 */
> >  						size_t offset = hi20_sym_val - hi20_loc;
> > +						/* Calculate lo12 */
> 
> stray copy+pasted line?
> 
> >  						if (IS_ENABLED(CONFIG_MODULE_SECTIONS)
> >  							&& hi20_type == R_RISCV_GOT_HI20) {
> >  							offset = module_emit_got_entry(
> >  								me, hi20_sym_val);
> >  							offset = offset - hi20_loc;
> > -				
> > +
> 
> need to just remove this blank line
> 
> >  						}
> >  						hi20 = (offset + 0x800) & 0xfffff000;
> >  						lo12 = offset - hi20;
> >  						v = lo12;
> > -					
> > +
> 
> same here
> 
> >  						break;
> >  					}
> >  				}
> > -				
> > -				if (j == j_idx)
> > -				{
> > +
> > +				if (j == j_idx) {
> >  					pr_err(
> >  						"%s: Can not find HI20 relocation information\n",
> >  						me->name);
> >  					return -EINVAL;
> >  				}
> > -				
> > -				
> > +
> > +
> 
> should just remove both the blank lines above
> 
> >  			}
> > -			
> > +
> > +			/* Record the previous j-loop end index */
> >  			j_idx = j;
> 
> Huh... what did I miss? I went through the whole patch but only see
> formatting changes (and more opportunity for formatting changes).
> Where's the actual change?

It's best to do formatting ("no functional change") changes separately
from the functional changes to help reviewers see that, too. Please
separate the formatting into its own patch.

Thanks,
drew
Conor Dooley May 15, 2023, 11:50 a.m. UTC | #4
On Mon, May 15, 2023 at 05:58:35PM +0800, 李筱云(李筱云) wrote:
> Hi Conor,
> 
> On Fri, May 12, 2023 at 04:51:53PM +0800, Amma Lee wrote:
> >> The patch can optimize the running times of "insmod" command by modify ELF
> >> relocation function.
> >> When install the riscv ELF driver which contains multiple
> >> symbol table items to be relocated, kernel takes a lot of time to
> >> execute the relocation. For example, we install a 3+MB driver need 180+s.
> >> We focus on the riscv kernel handle R_RISCV_HI20 and R_RISCV_LO12 type
> >> items relocation function and find that there are two for-loops in this
> >> function. If we modify the begin number in the second for-loops iteration,
> >> we could save significant time for installation. We install the 3+MB
> >> driver could just need 2s.
> >>
> >> Signed-off-by: Amma Lee <lixiaoyun@binary-semi.com>
> >> Reviewed-by: Conor Dooley <conor@kernel.org>
> 
> >I did not provide this tag. Do not add tags that you were not explicitly
> >given by people.
> >
> >Also, you didn't actually even implement all of the things I pointed out
> >either :(
> 
> I'm sorry. I will delete the "Reviewed-by" line. And in the patch v2, I
> changed the name as "Amma Lee" and add some comments before the
> modified code. Is there other content I need to modify?

Perhaps, I have not examined the actual implementation here.
Hopefully someone that understands the code better will do that.
I was mainly pointing out low-hanging items that our automation caught.


> >> ---
> >>  arch/riscv/kernel/module.c | 53 ++++++++++++++++++++++++++++++----------------
> >>  1 file changed, 35 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> >> index 55507b0..1683c1d 100755
> >> --- a/arch/riscv/kernel/module.c
> >> +++ b/arch/riscv/kernel/module.c
> >> @@ -385,9 +385,17 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >>
> >>    if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
> >>     unsigned int j;
> >> -   /*Modify the j for-loops begin number from last iterates end value*/
> >> +
> >> +   /*
> >> +   * In the second for-loops, each traversal for j is
> >                        ^^
> >Comment in passing here, we align the *s in comments.
> 
> Do you mean that there is no blank in '*' and the key?

In my plaintext editor, the ^^ was aligned with the *.
Please read the kernel coding style documentation for more information:
https://www.kernel.org/doc/html/v4.10/process/coding-style.html#commenting

> >> +   * starts from 0 to the symbol table item index which
> >> +   * is detected. By the tool "readelf", we find that all
> >> +   * the symbol table items about R_RISCV_PCREL_HI20 type
> >> +   * are incrementally added in order. It means that we
> >> +   * could interate the j with the previous loop end
> >> +   * value(j_idx) as the begin number in the next loop;
> >> +   */
> >>     for (j = j_idx; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
> >> -   /* Modify end */
> >>      unsigned long hi20_loc =
> >>       sechdrs[sechdrs[relsec].sh_info].sh_addr
> >>       + rel[j].r_offset;
> >> @@ -420,22 +428,30 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
> >>       break;
> >>      }
> >>     }
> >> +
> >>     if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
> >> -    if(j_idx == 0){
> >> +    if (j_idx == 0) {
> >>       pr_err(
> >>        "%s: Can not find HI20 relocation information\n",
> >>        me->name);
> >>       return -EINVAL;
> >>      }
> >> -
> >> -
> >
> >^^ and this is one of the checkpatch complaints I mentioned last time,
> >no double blank lines please.
> >
> >Also, this version of the patch doesn't actually apply for me to
> >v6.4-rc1 anymore, but the file has not been modified in upstream since
> >29/01. Not too sure what you have used as the base for this as a result
> >:/
> Thanks, I will delete the double blank lines. We found and fix the issue -
> insmod drivers operation takes a lot time which contains multiple symbol
> items to be relocated in the kernel version 5.10. Then we compare the
> 6.4 new code and 5.10 kernel, find the insmod process(5.10 version in
> the kernel\module.c and arch\riscv\kernel\module.c, 6.4 version
> in the kernel\module\main.c and arch\riscv\kernel\module.c) never be
> changed. So we guess that the kernel version 6.4 has the same issue.
> May I commit the patch in 6.4 kernel?

In fact, you *must* do your development against the 6.4 version of the
kernel. If you want to fix it for 5.10, you must first either:
a) fix the problem in 6.4, or
b) show that there is not a problem in 6.4 & figure out where it was
   introduced

The stable maintainers won't take the fix for 5.10 unless it is already
in mainline or not a problem in mainline.

Also, this email was in html form. Please don't send html mail to the
mailing list. I had to jump through some hoops to be even able to reply
to it on my current setup :(

Thanks,
Conor.
diff mbox series

Patch

diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
index 55507b0..1683c1d 100755
--- a/arch/riscv/kernel/module.c
+++ b/arch/riscv/kernel/module.c
@@ -385,9 +385,17 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 
 		if (type == R_RISCV_PCREL_LO12_I || type == R_RISCV_PCREL_LO12_S) {
 			unsigned int j;
-			/*Modify the j for-loops begin number from last iterates end value*/
+
+			/*
+			* In the second for-loops, each traversal for j is
+			* starts from 0 to the symbol table item index which
+			* is detected. By the tool "readelf", we find that all
+			* the symbol table items about R_RISCV_PCREL_HI20 type
+			* are incrementally added in order. It means that we
+			* could interate the j with the previous loop end
+			* value(j_idx) as the begin number in the next loop;
+			*/
 			for (j = j_idx; j < sechdrs[relsec].sh_size / sizeof(*rel); j++) {
-			/* Modify end */
 				unsigned long hi20_loc =
 					sechdrs[sechdrs[relsec].sh_info].sh_addr
 					+ rel[j].r_offset;
@@ -420,22 +428,30 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 					break;
 				}
 			}
+
 			if (j == sechdrs[relsec].sh_size / sizeof(*rel)) {
-				if(j_idx == 0){
+				if (j_idx == 0) {
 					pr_err(
 						"%s: Can not find HI20 relocation information\n",
 						me->name);
 					return -EINVAL;
 				}
-				
-				
-				for (j = 0; j < j_idx; j++){ 
+
+				/*
+				* If the last j-loop have been traversed to the
+				* maximum value but never match the
+				* corresponding symbol relocation item, the
+				* j-loop will execute the second loop which
+				* is begin from 0 to the prerious index (j_idx)
+				* unless the previous j_idx == 0;
+				*/
+				for (j = 0; j < j_idx; j++) {
 					unsigned long hi20_loc =
 						sechdrs[sechdrs[relsec].sh_info].sh_addr
 						+ rel[j].r_offset;
 					u32 hi20_type = ELF_RISCV_R_TYPE(rel[j].r_info);
-				
-				
+
+
 					/* Find the corresponding HI20 relocation entry */
 					if (hi20_loc == sym->st_value
 						&& (hi20_type == R_RISCV_PCREL_HI20
@@ -447,36 +463,37 @@  int apply_relocate_add(Elf_Shdr *sechdrs, const char *strtab,
 						unsigned long hi20_sym_val =
 							hi20_sym->st_value
 							+ rel[j].r_addend;
-					
-				
+
+
 						/* Calculate lo12 */
 						size_t offset = hi20_sym_val - hi20_loc;
+						/* Calculate lo12 */
 						if (IS_ENABLED(CONFIG_MODULE_SECTIONS)
 							&& hi20_type == R_RISCV_GOT_HI20) {
 							offset = module_emit_got_entry(
 								me, hi20_sym_val);
 							offset = offset - hi20_loc;
-				
+
 						}
 						hi20 = (offset + 0x800) & 0xfffff000;
 						lo12 = offset - hi20;
 						v = lo12;
-					
+
 						break;
 					}
 				}
-				
-				if (j == j_idx)
-				{
+
+				if (j == j_idx) {
 					pr_err(
 						"%s: Can not find HI20 relocation information\n",
 						me->name);
 					return -EINVAL;
 				}
-				
-				
+
+
 			}
-			
+
+			/* Record the previous j-loop end index */
 			j_idx = j;
 		}