Message ID | 20170305112047.3411-8-luc.vanoostenryck@gmail.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Sun, Mar 5, 2017 at 7:20 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > In sparse, pointer arithmetic and accessing the field > of a structure or an array is simply done via OP_ADD, > the offset being calculated at evaluation time. > On the other hand, LLVM allows addition only on two > integers and pointer arithmetic/member access is done > via 'getelementptr'. > > sparse-llvm didn't took this in account which resulted > in type error in 'add' instructions. > > Fix this by catching addition involving pointer and issuing > a getelementptr' instruction for these. I have one related question. In the case of anonymous structure or union, how to you figure out which series of GEP it needs to be? I think sparse already lost the element pointer index information. You can construct it back by looking at the bit offset. But if there is union then the element point can have multiple path to reach to the same bit offset. I don't know how to deal with that. > > Originally-by: Dibyendu Majumdar <mobile@majumdar.org.uk> > Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> > --- > sparse-llvm.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/sparse-llvm.c b/sparse-llvm.c > index 27cc1b88c..ee374b217 100644 > --- a/sparse-llvm.c > +++ b/sparse-llvm.c > @@ -472,6 +472,10 @@ static void output_op_binary(struct function *fn, struct instruction *insn) > case OP_ADD: > if (symbol_is_fp_type(insn->type)) > target = LLVMBuildFAdd(fn->builder, lhs, rhs, target_name); > + else if (LLVMGetTypeKind(LLVMTypeOf(lhs)) == LLVMPointerTypeKind) > + target = LLVMBuildGEP(fn->builder, lhs, &rhs, 1, ""); > + else if (LLVMGetTypeKind(LLVMTypeOf(rhs)) == LLVMPointerTypeKind) > + target = LLVMBuildGEP(fn->builder, rhs, &lhs, 1, ""); It seems that you only have one level of indices. Also the OP_ADD use the member offset of the struct. I am not sure how it map into GEP indices. Correct me if I am wrong, it seems to me you use the member offset as element indices. I think it need to get a mapping between offset to indices. Chris -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Chris, On 6 March 2017 at 15:16, Christopher Li <sparse@chrisli.org> wrote: > On Sun, Mar 5, 2017 at 7:20 PM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: >> In sparse, pointer arithmetic and accessing the field >> of a structure or an array is simply done via OP_ADD, >> the offset being calculated at evaluation time. >> On the other hand, LLVM allows addition only on two >> integers and pointer arithmetic/member access is done >> via 'getelementptr'. >> >> sparse-llvm didn't took this in account which resulted >> in type error in 'add' instructions. >> >> Fix this by catching addition involving pointer and issuing >> a getelementptr' instruction for these. > > I have one related question. In the case of anonymous > structure or union, how to you figure out which series of > GEP it needs to be? I think sparse already lost the element > pointer index information. You can construct it back by looking > at the bit offset. But if there is union then the element point can > have multiple path to reach to the same bit offset. I don't > know how to deal with that. Sparse-llvm appears to bypass the normal struct GEP in LLVM. It basically casts everything to char *, uses GEP to obtain a pointer to member, and then casts it back to member type. So this should work for structs and unions. > >> >> Originally-by: Dibyendu Majumdar <mobile@majumdar.org.uk> >> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> >> --- >> sparse-llvm.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/sparse-llvm.c b/sparse-llvm.c >> index 27cc1b88c..ee374b217 100644 >> --- a/sparse-llvm.c >> +++ b/sparse-llvm.c >> @@ -472,6 +472,10 @@ static void output_op_binary(struct function *fn, struct instruction *insn) >> case OP_ADD: >> if (symbol_is_fp_type(insn->type)) >> target = LLVMBuildFAdd(fn->builder, lhs, rhs, target_name); >> + else if (LLVMGetTypeKind(LLVMTypeOf(lhs)) == LLVMPointerTypeKind) >> + target = LLVMBuildGEP(fn->builder, lhs, &rhs, 1, ""); >> + else if (LLVMGetTypeKind(LLVMTypeOf(rhs)) == LLVMPointerTypeKind) >> + target = LLVMBuildGEP(fn->builder, rhs, &lhs, 1, ""); > > It seems that you only have one level of indices. Also the OP_ADD use the member > offset of the struct. I am not sure how it map into GEP indices. > Correct me if I am > wrong, it seems to me you use the member offset as element indices. I > think it need > to get a mapping between offset to indices. > This case is more about handing pointer arithmetic correctly. Here GEP is being used as array element access and not for struct member access. Hope this is helpful. Regards Dibyendu -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 6, 2017 at 11:32 PM, Dibyendu Majumdar <mobile@majumdar.org.uk> wrote: > > Sparse-llvm appears to bypass the normal struct GEP in LLVM. It > basically casts everything to char *, uses GEP to obtain a pointer to > member, and then casts it back to member type. So this should work for > structs and unions. That is exactly my question. I haven't understand how this by pass can work in all the situation. > > This case is more about handing pointer arithmetic correctly. Here GEP > is being used as array element access and not for struct member > access. Even using array element access, I still have some question regarding the the mapping process of the array index. Correct me if I am wrong, the GEP indices are express as array index. So after convert to (char*) + offset, the offset will express as n*(sizeof(element type)). "n" is the array index. However, the OP_ADD instruction, the offset is byte offset value. The offset part might not be align to n*(sizeof(array_elelment)). Assume we can use some pack struct to get some member located not at the natural alignment offset. Even if the offset is at the type size alignment, shouldn't the indices express as ( offset / sizeof(element type))? What do I miss? Chris -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 07, 2017 at 12:22:07AM +0800, Christopher Li wrote: > On Mon, Mar 6, 2017 at 11:32 PM, Dibyendu Majumdar > <mobile@majumdar.org.uk> wrote: > > > > Sparse-llvm appears to bypass the normal struct GEP in LLVM. It > > basically casts everything to char *, uses GEP to obtain a pointer to > > member, and then casts it back to member type. So this should work for > > structs and unions. > > That is exactly my question. I haven't understand how this by pass > can work in all the situation. > > > > > This case is more about handing pointer arithmetic correctly. Here GEP > > is being used as array element access and not for struct member > > access. > > Even using array element access, I still have some question regarding > the the mapping process of the array index. > Correct me if I am wrong, the GEP indices are express as array index. > So after convert to (char*) + offset, the offset will express as > n*(sizeof(element type)). > "n" is the array index. > > However, the OP_ADD instruction, the offset is byte offset value. The offset > part might not be align to n*(sizeof(array_elelment)). Assume we can use > some pack struct to get some member located not at the natural alignment offset. > > Even if the offset is at the type size alignment, shouldn't the indices express > as ( offset / sizeof(element type))? What do I miss? With an example: == C code == void *foo(int *p) { return p + 5; } == linearized code == foo: .L0: <entry-point> add.64 %r2 <- %arg1, $20 cast.64 %r3 <- (64) %r2 ret.64 %r3 == LLVM code from sparse-llvm == ; ModuleID = '<stdin>' source_filename = "sparse" define i8* @foo(i32* %ARG1) { L0: %0 = getelementptr i32, i32* %ARG1, inttoptr (i64 20 to i32*) %R3 = bitcast i32* %0 to i8* ret i8* %R3 } -- Luc -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Luc, On 6 March 2017 at 16:43, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > On Tue, Mar 07, 2017 at 12:22:07AM +0800, Christopher Li wrote: >> On Mon, Mar 6, 2017 at 11:32 PM, Dibyendu Majumdar >> <mobile@majumdar.org.uk> wrote: >> > >> > Sparse-llvm appears to bypass the normal struct GEP in LLVM. It >> > basically casts everything to char *, uses GEP to obtain a pointer to >> > member, and then casts it back to member type. So this should work for >> > structs and unions. >> >> That is exactly my question. I haven't understand how this by pass >> can work in all the situation. >> >> > >> > This case is more about handing pointer arithmetic correctly. Here GEP >> > is being used as array element access and not for struct member >> > access. >> >> Even using array element access, I still have some question regarding >> the the mapping process of the array index. >> Correct me if I am wrong, the GEP indices are express as array index. >> So after convert to (char*) + offset, the offset will express as >> n*(sizeof(element type)). >> "n" is the array index. >> >> However, the OP_ADD instruction, the offset is byte offset value. The offset >> part might not be align to n*(sizeof(array_elelment)). Assume we can use >> some pack struct to get some member located not at the natural alignment offset. >> >> Even if the offset is at the type size alignment, shouldn't the indices express >> as ( offset / sizeof(element type))? What do I miss? > > With an example: > == C code == > void *foo(int *p) { return p + 5; } > > == linearized code == > foo: > .L0: > <entry-point> > add.64 %r2 <- %arg1, $20 > cast.64 %r3 <- (64) %r2 > ret.64 %r3 > > == LLVM code from sparse-llvm == > ; ModuleID = '<stdin>' > source_filename = "sparse" > > define i8* @foo(i32* %ARG1) { > L0: > %0 = getelementptr i32, i32* %ARG1, inttoptr (i64 20 to i32*) > %R3 = bitcast i32* %0 to i8* > ret i8* %R3 > } > This may be the issue Chris is highlighting - above the offset should be 5 not 20 in the LLVM code as the array type is i32* not i8*. Does sparse always output array offsets in char*? Regards Dibyendu -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 7, 2017 at 12:43 AM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > With an example: > == C code == > void *foo(int *p) { return p + 5; } > > == linearized code == > foo: > .L0: > <entry-point> > add.64 %r2 <- %arg1, $20 > cast.64 %r3 <- (64) %r2 > ret.64 %r3 > > == LLVM code from sparse-llvm == > ; ModuleID = '<stdin>' > source_filename = "sparse" > > define i8* @foo(i32* %ARG1) { > L0: > %0 = getelementptr i32, i32* %ARG1, inttoptr (i64 20 to i32*) > %R3 = bitcast i32* %0 to i8* > ret i8* %R3 > } > OK, good. Let's use this example. I am using clang to get the llvm bytecode. t.c is your example. clang -S -emit-llvm /tmp/t.c Here is output, compare the line I am pointing at: The indices should be 5 according clang's output. However sparse-llvm generate as 20 per your output if I am reading it correctly. Chris ============== t.ll ===================== ModuleID = '/tmp/t.c' target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ; Function Attrs: nounwind uwtable define i8* @foo(i32* %p) #0 { %1 = alloca i32*, align 8 store i32* %p, i32** %1, align 8 %2 = load i32*, i32** %1, align 8 %3 = getelementptr inbounds i32, i32* %2, i64 5 <======== look here %4 = bitcast i32* %3 to i8* ret i8* %4 } attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 3.8.1 (tags/RELEASE_381/final)"} ================================================ -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 6, 2017 at 8:43 AM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > With an example: > == C code == > void *foo(int *p) { return p + 5; } > > == linearized code == > foo: > .L0: > <entry-point> > add.64 %r2 <- %arg1, $20 > cast.64 %r3 <- (64) %r2 > ret.64 %r3 This is correct. > == LLVM code from sparse-llvm == > > define i8* @foo(i32* %ARG1) { > L0: > %0 = getelementptr i32, i32* %ARG1, inttoptr (i64 20 to i32*) > %R3 = bitcast i32* %0 to i8* This is garbage, I'm afraid. When sparse does the "add 20 to pointer", it adds the *byte offset* 20 to the pointer. The LLVM module should not use "getelementptr" for this, because it's not element #20, it's the element at offset 20. I think you're supposed to either use "uglygep" with the base pointer cast to a simple address-unit pointer (ie unsigned char). Or not use GEP at all. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 06, 2017 at 05:06:01PM +0000, Dibyendu Majumdar wrote: > > This may be the issue Chris is highlighting - above the offset should > be 5 not 20 in the LLVM code as the array type is i32* not i8*. Does > sparse always output array offsets in char*? Yes, of course. I think that the best way to see things is not to consider that "sparse output offset in char*". As you have already seen, what sparse produce after linearization is more low-level that LLVM's IR, it's closer to what a CPU would see. So I think you should see sparse's offset simply as an *address offset*, the values involved being the values that would in the register of some CPU. -- Luc -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 07, 2017 at 01:07:27AM +0800, Christopher Li wrote: > On Tue, Mar 7, 2017 at 12:43 AM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > With an example: > > == C code == > > void *foo(int *p) { return p + 5; } > > > > == linearized code == > > foo: > > .L0: > > <entry-point> > > add.64 %r2 <- %arg1, $20 > > cast.64 %r3 <- (64) %r2 > > ret.64 %r3 > > > > == LLVM code from sparse-llvm == > > ; ModuleID = '<stdin>' > > source_filename = "sparse" > > > > define i8* @foo(i32* %ARG1) { > > L0: > > %0 = getelementptr i32, i32* %ARG1, inttoptr (i64 20 to i32*) > > %R3 = bitcast i32* %0 to i8* > > ret i8* %R3 > > } > > > > > OK, good. Let's use this example. > I am using clang to get the llvm bytecode. t.c is your example. > > clang -S -emit-llvm /tmp/t.c > Here is output, compare the line I am pointing at: > The indices should be 5 according clang's output. > However sparse-llvm generate as 20 per your output > if I am reading it correctly. Absolutely. -- Luc -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 06, 2017 at 10:17:35AM -0800, Linus Torvalds wrote: > On Mon, Mar 6, 2017 at 8:43 AM, Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > With an example: > > == C code == > > void *foo(int *p) { return p + 5; } > > > > == linearized code == > > foo: > > .L0: > > <entry-point> > > add.64 %r2 <- %arg1, $20 > > cast.64 %r3 <- (64) %r2 > > ret.64 %r3 > > This is correct. Interestingly, the following code is linearized a bit differently: == C code == void *foo(int *p) { return p += 5; } == linearized code == foo: .L0: <entry-point> cast.64 %r98 <- (64) %arg1 add.64 %r99 <- %r98, $20 ptrcast.64 %r100 <- (64) %r99 cast.64 %r101 <- (64) %r100 ret.64 %r101 Where the type of %r98 & %r99 is 'unsigned long' (or more probably size_t). This is then correctly LLVMized as: define i8* @foo(i32* %ARG1) { L18: %R98 = ptrtoint i32* %ARG1 to i64 %R99 = add i64 %R98, 20 %R100 = inttoptr i64 %R99 to i32* %R101 = bitcast i32* %R100 to i8* ret i8* %R101 } > > == LLVM code from sparse-llvm == > > > > define i8* @foo(i32* %ARG1) { > > L0: > > %0 = getelementptr i32, i32* %ARG1, inttoptr (i64 20 to i32*) > > %R3 = bitcast i32* %0 to i8* > > This is garbage, I'm afraid. > > When sparse does the "add 20 to pointer", it adds the *byte offset* 20 > to the pointer. The LLVM module should not use "getelementptr" for > this, because it's not element #20, it's the element at offset 20. It's clear that there is a semantic gap between sparse's & LLVM's IR. > I think you're supposed to either use "uglygep" with the base pointer > cast to a simple address-unit pointer (ie unsigned char). > > Or not use GEP at all. > > Linus I bet we'll have others problems with GEP. Luc -- To unsubscribe from this list: send the line "unsubscribe linux-sparse" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/sparse-llvm.c b/sparse-llvm.c index 27cc1b88c..ee374b217 100644 --- a/sparse-llvm.c +++ b/sparse-llvm.c @@ -472,6 +472,10 @@ static void output_op_binary(struct function *fn, struct instruction *insn) case OP_ADD: if (symbol_is_fp_type(insn->type)) target = LLVMBuildFAdd(fn->builder, lhs, rhs, target_name); + else if (LLVMGetTypeKind(LLVMTypeOf(lhs)) == LLVMPointerTypeKind) + target = LLVMBuildGEP(fn->builder, lhs, &rhs, 1, ""); + else if (LLVMGetTypeKind(LLVMTypeOf(rhs)) == LLVMPointerTypeKind) + target = LLVMBuildGEP(fn->builder, rhs, &lhs, 1, ""); else target = LLVMBuildAdd(fn->builder, lhs, rhs, target_name); break;
In sparse, pointer arithmetic and accessing the field of a structure or an array is simply done via OP_ADD, the offset being calculated at evaluation time. On the other hand, LLVM allows addition only on two integers and pointer arithmetic/member access is done via 'getelementptr'. sparse-llvm didn't took this in account which resulted in type error in 'add' instructions. Fix this by catching addition involving pointer and issuing a getelementptr' instruction for these. Originally-by: Dibyendu Majumdar <mobile@majumdar.org.uk> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> --- sparse-llvm.c | 4 ++++ 1 file changed, 4 insertions(+)