Message ID | 20210212211607.2890660-1-morbo@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Combining CUs into a single hash table | expand |
Bump for exposure. On Fri, Feb 12, 2021 at 1:16 PM Bill Wendling <morbo@google.com> wrote: > > Hey gang, > > I would like your feedback on this patch. > > This patch creates one hash table that all CUs share. The impetus for this > patch is to support clang's LTO (Link-Time Optimizations). Currently, pahole > can't handle the DWARF data that clang produces, because the CUs may refer to > tags in other CUs (all of the code having been squozen together). > > One solution I found is to process the CUs in two steps: > > 1. add the CUs into a single hash table, and > 2. perform the recoding and finalization steps in a a separate step. > > The issue I'm facing with this patch is that it balloons the runtime from > ~11.11s to ~14.27s. It looks like the underlying cause is that some (but not > all) hash buckets have thousands of entries each. I've bumped up the > HASHTAGS__BITS from 15 to 16, which helped a little. Bumping it up to 17 or > above causes a failure. > > A couple of things I thought of may help. We could increase the number of > buckets, which would help with distribution. As I mentioned though, that seemed > to cause a failure. Another option is to store the bucket entries in a > non-list, e.g. binary search tree. > > I wanted to get your opinions before I trod down one of these roads. > > Share and enjoy! > -bw > > Bill Wendling (1): > dwarf_loader: have all CUs use a single hash table > > dwarf_loader.c | 45 +++++++++++++++++++++++++++++++++------------ > 1 file changed, 33 insertions(+), 12 deletions(-) > > -- > 2.30.0.478.g8a0d178c01-goog >
Em Tue, Feb 23, 2021 at 12:44:58PM -0800, Bill Wendling escreveu: > Bump for exposure. While preparing my presentation for devconf.cz I stumbled on a problem with split btf, I want to first bisect this before publishing... I'll move this to the front of my priority list and inform here about it ASAP. - Arnaldo > On Fri, Feb 12, 2021 at 1:16 PM Bill Wendling <morbo@google.com> wrote: > > > > Hey gang, > > > > I would like your feedback on this patch. > > > > This patch creates one hash table that all CUs share. The impetus for this > > patch is to support clang's LTO (Link-Time Optimizations). Currently, pahole > > can't handle the DWARF data that clang produces, because the CUs may refer to > > tags in other CUs (all of the code having been squozen together). > > > > One solution I found is to process the CUs in two steps: > > > > 1. add the CUs into a single hash table, and > > 2. perform the recoding and finalization steps in a a separate step. > > > > The issue I'm facing with this patch is that it balloons the runtime from > > ~11.11s to ~14.27s. It looks like the underlying cause is that some (but not > > all) hash buckets have thousands of entries each. I've bumped up the > > HASHTAGS__BITS from 15 to 16, which helped a little. Bumping it up to 17 or > > above causes a failure. > > > > A couple of things I thought of may help. We could increase the number of > > buckets, which would help with distribution. As I mentioned though, that seemed > > to cause a failure. Another option is to store the bucket entries in a > > non-list, e.g. binary search tree. > > > > I wanted to get your opinions before I trod down one of these roads. > > > > Share and enjoy! > > -bw > > > > Bill Wendling (1): > > dwarf_loader: have all CUs use a single hash table > > > > dwarf_loader.c | 45 +++++++++++++++++++++++++++++++++------------ > > 1 file changed, 33 insertions(+), 12 deletions(-) > > > > -- > > 2.30.0.478.g8a0d178c01-goog > >
On 2/23/21 12:44 PM, Bill Wendling wrote: > Bump for exposure. > > On Fri, Feb 12, 2021 at 1:16 PM Bill Wendling <morbo@google.com> wrote: >> >> Hey gang, >> >> I would like your feedback on this patch. >> >> This patch creates one hash table that all CUs share. The impetus for this >> patch is to support clang's LTO (Link-Time Optimizations). Currently, pahole >> can't handle the DWARF data that clang produces, because the CUs may refer to >> tags in other CUs (all of the code having been squozen together). Hi, Bill, LTO build support is now in linus tree 5.12 rc2 and also merged in latest bpf-next. I tried thin-LTO build and it is fine with latest trunk llvm (llvm13) until it hits pahole and it stuck there (pahole 1.20) probably some kind of infinite loop in pahole as pahole is not ready to handle lto dwarf yet. I then applied this patch on top of master pahole (1.20) and pahole seg faulted. I did not debug. Have you hit the same issue? How did you make pahole work with LTO built kernel? Thanks! Yonghong >> >> One solution I found is to process the CUs in two steps: >> >> 1. add the CUs into a single hash table, and >> 2. perform the recoding and finalization steps in a a separate step. >> >> The issue I'm facing with this patch is that it balloons the runtime from >> ~11.11s to ~14.27s. It looks like the underlying cause is that some (but not >> all) hash buckets have thousands of entries each. I've bumped up the >> HASHTAGS__BITS from 15 to 16, which helped a little. Bumping it up to 17 or >> above causes a failure. >> >> A couple of things I thought of may help. We could increase the number of >> buckets, which would help with distribution. As I mentioned though, that seemed >> to cause a failure. Another option is to store the bucket entries in a >> non-list, e.g. binary search tree. >> >> I wanted to get your opinions before I trod down one of these roads. >> >> Share and enjoy! >> -bw >> >> Bill Wendling (1): >> dwarf_loader: have all CUs use a single hash table >> >> dwarf_loader.c | 45 +++++++++++++++++++++++++++++++++------------ >> 1 file changed, 33 insertions(+), 12 deletions(-) >> >> -- >> 2.30.0.478.g8a0d178c01-goog >>
On Sat, Mar 13, 2021 at 11:05 PM Yonghong Song <yhs@fb.com> wrote: > On 2/23/21 12:44 PM, Bill Wendling wrote: > > Bump for exposure. > > > > On Fri, Feb 12, 2021 at 1:16 PM Bill Wendling <morbo@google.com> wrote: > >> > >> Hey gang, > >> > >> I would like your feedback on this patch. > >> > >> This patch creates one hash table that all CUs share. The impetus for this > >> patch is to support clang's LTO (Link-Time Optimizations). Currently, pahole > >> can't handle the DWARF data that clang produces, because the CUs may refer to > >> tags in other CUs (all of the code having been squozen together). > > Hi, Bill, > > LTO build support is now in linus tree 5.12 rc2 and also merged in > latest bpf-next. I tried thin-LTO build and it is fine with latest > trunk llvm (llvm13) until it hits pahole and it stuck there (pahole > 1.20) probably some kind of infinite loop in pahole as pahole is > not ready to handle lto dwarf yet. > > I then applied this patch on top of master pahole (1.20) and pahole > seg faulted. I did not debug. Have you hit the same issue? > How did you make pahole work with LTO built kernel? > Hi Yonghong, I haven't tried this very much with top-of-tree Linux, but it's quite possible that there's a segfaulting issue I haven't come across yet. Make sure that you're using pahole v1.20, because it supports clang's penchant for assigning some objects "null" names. This patch is the first step in my attempt to get pahole working with LTO. There's a follow-up patch that I'll attach to this email that gets me through the compilation. It's not been heavily tested or reviewed (it's in my local tree), so caveat emptor. I would love to have people test it to see if it helps or just makes things worse. Cheers! -bw > Thanks! > > Yonghong > > >> > >> One solution I found is to process the CUs in two steps: > >> > >> 1. add the CUs into a single hash table, and > >> 2. perform the recoding and finalization steps in a a separate step. > >> > >> The issue I'm facing with this patch is that it balloons the runtime from > >> ~11.11s to ~14.27s. It looks like the underlying cause is that some (but not > >> all) hash buckets have thousands of entries each. I've bumped up the > >> HASHTAGS__BITS from 15 to 16, which helped a little. Bumping it up to 17 or > >> above causes a failure. > >> > >> A couple of things I thought of may help. We could increase the number of > >> buckets, which would help with distribution. As I mentioned though, that seemed > >> to cause a failure. Another option is to store the bucket entries in a > >> non-list, e.g. binary search tree. > >> > >> I wanted to get your opinions before I trod down one of these roads. > >> > >> Share and enjoy! > >> -bw > >> > >> Bill Wendling (1): > >> dwarf_loader: have all CUs use a single hash table > >> > >> dwarf_loader.c | 45 +++++++++++++++++++++++++++++++++------------ > >> 1 file changed, 33 insertions(+), 12 deletions(-) > >> > >> -- > >> 2.30.0.478.g8a0d178c01-goog > >>
On 3/14/21 12:28 AM, Bill Wendling wrote: > On Sat, Mar 13, 2021 at 11:05 PM Yonghong Song <yhs@fb.com> wrote: >> On 2/23/21 12:44 PM, Bill Wendling wrote: >>> Bump for exposure. >>> >>> On Fri, Feb 12, 2021 at 1:16 PM Bill Wendling <morbo@google.com> wrote: >>>> >>>> Hey gang, >>>> >>>> I would like your feedback on this patch. >>>> >>>> This patch creates one hash table that all CUs share. The impetus for this >>>> patch is to support clang's LTO (Link-Time Optimizations). Currently, pahole >>>> can't handle the DWARF data that clang produces, because the CUs may refer to >>>> tags in other CUs (all of the code having been squozen together). >> >> Hi, Bill, >> >> LTO build support is now in linus tree 5.12 rc2 and also merged in >> latest bpf-next. I tried thin-LTO build and it is fine with latest >> trunk llvm (llvm13) until it hits pahole and it stuck there (pahole >> 1.20) probably some kind of infinite loop in pahole as pahole is >> not ready to handle lto dwarf yet. >> >> I then applied this patch on top of master pahole (1.20) and pahole >> seg faulted. I did not debug. Have you hit the same issue? >> How did you make pahole work with LTO built kernel? >> > Hi Yonghong, > > I haven't tried this very much with top-of-tree Linux, but it's quite > possible that there's a segfaulting issue I haven't come across yet. > Make sure that you're using pahole v1.20, because it supports clang's > penchant for assigning some objects "null" names. > > This patch is the first step in my attempt to get pahole working with > LTO. There's a follow-up patch that I'll attach to this email that > gets me through the compilation. It's not been heavily tested or > reviewed (it's in my local tree), so caveat emptor. I would love to > have people test it to see if it helps or just makes things worse. I applied you "Combining CUs into a single hash table" patch and the attach patch, now pahole does not segfault any more, but I still get the following pahole errors: ... <ERROR(tag__size:1040): 1622 not found!> <ERROR(tag__size:1040): 1617 not found!> <ERROR(tag__size:1040): 1615 not found!> error: found variable 'loaded_vmcss_on_cpu' in CU '/home/yhs/work/bpf-next/arch/x86/kvm/vmx/vmx.c' that has void type Encountered error while encoding BTF. FYI, I compiled latest bpf-next with the following command: make LLVM=1 LLVM_IAS=1 -j60 the compiler is locally built with latest upstream llvm-project. I am using thin-lto in kernel config. I will take a look at your patch and the issue next week, hopefully we can resolve the issue soon. Thanks! > > Cheers! > -bw > >> Thanks! >> >> Yonghong >> >>>> >>>> One solution I found is to process the CUs in two steps: >>>> >>>> 1. add the CUs into a single hash table, and >>>> 2. perform the recoding and finalization steps in a a separate step. >>>> >>>> The issue I'm facing with this patch is that it balloons the runtime from >>>> ~11.11s to ~14.27s. It looks like the underlying cause is that some (but not >>>> all) hash buckets have thousands of entries each. I've bumped up the >>>> HASHTAGS__BITS from 15 to 16, which helped a little. Bumping it up to 17 or >>>> above causes a failure. >>>> >>>> A couple of things I thought of may help. We could increase the number of >>>> buckets, which would help with distribution. As I mentioned though, that seemed >>>> to cause a failure. Another option is to store the bucket entries in a >>>> non-list, e.g. binary search tree. >>>> >>>> I wanted to get your opinions before I trod down one of these roads. >>>> >>>> Share and enjoy! >>>> -bw >>>> >>>> Bill Wendling (1): >>>> dwarf_loader: have all CUs use a single hash table >>>> >>>> dwarf_loader.c | 45 +++++++++++++++++++++++++++++++++------------ >>>> 1 file changed, 33 insertions(+), 12 deletions(-) >>>> >>>> -- >>>> 2.30.0.478.g8a0d178c01-goog >>>>