Message ID | 036CD6AE-C560-4FC7-9B02-ADD08E380DC9@juniper.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] binfmt_elf: Dump smaller VMAs first in ELF cores | expand |
On Tue, 6 Aug 2024 at 11:16, Brian Mak <makb@juniper.net> wrote: > > @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) > cprm->vma_data_size += m->dump_size; > } > > + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), > + cmp_vma_size, NULL); > + > return true; > } Hmm. Realistically we only dump core in ELF, and the order of the segments shouldn't matter. But I wonder if we should do this in the ->core_dump() function itself, in case it would have mattered for other dump formats? IOW, instead of being at the bottom of dump_vma_snapshot(), maybe the sorting should be at the top of elf_core_dump()? And yes, in practice I doubt we'll ever have other dump formats, and no, a.out isn't doing some miraculous comeback either. But I bet you didn't test elf_fdpic_core_dump() even if I bet it (a) works and (b) nobody cares. So moving it to the ELF side might be conceptually the right thing to do? (Or is there some reason it needs to be done at snapshot time that I just didn't fully appreciate?) Linus
On Aug 6, 2024, at 11:33 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > Hmm. Realistically we only dump core in ELF, and the order of the > segments shouldn't matter. > > But I wonder if we should do this in the ->core_dump() function > itself, in case it would have mattered for other dump formats? > > IOW, instead of being at the bottom of dump_vma_snapshot(), maybe the > sorting should be at the top of elf_core_dump()? > > And yes, in practice I doubt we'll ever have other dump formats, and > no, a.out isn't doing some miraculous comeback either. > > But I bet you didn't test elf_fdpic_core_dump() even if I bet it (a) > works and (b) nobody cares. > > So moving it to the ELF side might be conceptually the right thing to do? > > (Or is there some reason it needs to be done at snapshot time that I > just didn't fully appreciate?) The main reason it was done at snapshot time was to make it so that it works with other dump formats like ELF FDPIC (that being said, yes I didn't explicitly test it). If another format is introduced and isn't compatible with this type of reordering, then I think the introduction of that format should be reconsidered for lack of flexibility, or if it really must be introduced, then this logic can be changed at that time. That being said, my opinion on this isn't too strong, so if you feel it is better to have it in elf_core_dump, we can do that too. Since Eric originally brought up placing it here, maybe he has his own opinions on this as well. Best, Brian Mak
On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote: > Large cores may be truncated in some scenarios, such as with daemons > with stop timeouts that are not large enough or lack of disk space. This > impacts debuggability with large core dumps since critical information > necessary to form a usable backtrace, such as stacks and shared library > information, are omitted. > > We attempted to figure out which VMAs are needed to create a useful > backtrace, and it turned out to be a non-trivial problem. Instead, we > try simply sorting the VMAs by size, which has the intended effect. > > [...] While waiting on rr test validation, and since we're at the start of the dev cycle, I figure let's get this into -next ASAP to see if anything else pops out. We can drop/revise if there are problems. (And as always, I will add any Acks/Reviews/etc that show up on the thread.) Applied to for-next/execve, thanks! [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores https://git.kernel.org/kees/c/9c531dfdc1bc Take care,
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, 6 Aug 2024 at 11:16, Brian Mak <makb@juniper.net> wrote: >> >> @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) >> cprm->vma_data_size += m->dump_size; >> } >> >> + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), >> + cmp_vma_size, NULL); >> + >> return true; >> } > > Hmm. Realistically we only dump core in ELF, and the order of the > segments shouldn't matter. > > But I wonder if we should do this in the ->core_dump() function > itself, in case it would have mattered for other dump formats? > > IOW, instead of being at the bottom of dump_vma_snapshot(), maybe the > sorting should be at the top of elf_core_dump()? > > And yes, in practice I doubt we'll ever have other dump formats, and > no, a.out isn't doing some miraculous comeback either. > > But I bet you didn't test elf_fdpic_core_dump() even if I bet it (a) > works and (b) nobody cares. > > So moving it to the ELF side might be conceptually the right thing to do? > > (Or is there some reason it needs to be done at snapshot time that I > just didn't fully appreciate?) I asked him to perform this at snapshot time. Plus it is obvious at snapshot time that you can change the allocated array, while it is not so obvious in the ->core_dump methods. I would argue that the long term maintainable thing to do is to merge elf_core_dump and elf_fdpic_core_dump and put all of the code in fs/coredump.c Performing the sort at snapshot time avoids introducing one extra reason why the two elf implementations of elf coredumping are different. I did read through the elf fdpic code quickly and it looks like it should just work no matter which order the vma's are dumped in. Just like the other elf coredump code does. My practical concern is that someone has a coredump thing that walks through the program headers and short circuits the walk because it knows the program headers are all written in order. But the only way to find one of those is to just try it. Eric
On Fri, 9 Aug 2024 at 07:40, Eric W. Biederman <ebiederm@xmission.com> wrote: > > I asked him to perform this at snapshot time. Plus it is obvious at > snapshot time that you can change the allocated array, while it is > not so obvious in the ->core_dump methods. Fair enough. The days when we supported a.out dumps are obviously long long gone, and aren't coming back. So I am not adamant that it has to be done in the dumper, and I probably just have that historical "we have multiple different dumpers with different rules" mindset that isn't really relevant any more. > I would argue that the long term maintainable thing to do is to > merge elf_core_dump and elf_fdpic_core_dump and put all of the code > in fs/coredump.c I wouldn't object. It's not like there's any foreseeable new core dump format that we'd expect, and the indirection makes the code flow harder to follow. Not that most people look at this code a lot. Linus
On Aug 6, 2024, at 10:21 PM, Kees Cook <kees@kernel.org> wrote: > > On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote: >> Large cores may be truncated in some scenarios, such as with daemons >> with stop timeouts that are not large enough or lack of disk space. This >> impacts debuggability with large core dumps since critical information >> necessary to form a usable backtrace, such as stacks and shared library >> information, are omitted. >> >> We attempted to figure out which VMAs are needed to create a useful >> backtrace, and it turned out to be a non-trivial problem. Instead, we >> try simply sorting the VMAs by size, which has the intended effect. >> >> [...] > > While waiting on rr test validation, and since we're at the start of the > dev cycle, I figure let's get this into -next ASAP to see if anything > else pops out. We can drop/revise if there are problems. (And as always, > I will add any Acks/Reviews/etc that show up on the thread.) > > Applied to for-next/execve, thanks! > > [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores > https://urldefense.com/v3/__https://git.kernel.org/kees/c/9c531dfdc1bc__;!!NEt6yMaO-gk!FK3UfXVndoYpve8Y7q7vacIoHOrTj2nJgSJbugqUB5LfciKy0_Xvit9aXz_XCWlXHpdRQO2ArP0$ Thanks, Kees! And, thanks Linus + Eric for taking the time to comment on this. Regarding the rr tests, it was not an easy task to get the environment set up to do this, but I did it and was able to run the tests. The rr tests require a lot of kernel config options and there's no list documenting what's needed anywhere... All the tests pass except for the sioc and sioc-no-syscallbuf tests. However, these test failures are due to an incompatibility with the network adapter I'm using. It seems that it only likes older network adapters. I've switched my virtualized network adapter twice now, and each time, the test gets a bit further than the previous time. Will continue trying different network adapters until something hopefully works. In any case, since this error isn't directly related to my changes and the rest of the tests pass, then I think we can be pretty confident that this change is not breaking rr. Best, Brian Mak > Take care, > > -- > Kees Cook >
On Sat, Aug 10, 2024 at 12:52:16AM +0000, Brian Mak wrote: > On Aug 6, 2024, at 10:21 PM, Kees Cook <kees@kernel.org> wrote: > > > > On Tue, 06 Aug 2024 18:16:02 +0000, Brian Mak wrote: > >> Large cores may be truncated in some scenarios, such as with daemons > >> with stop timeouts that are not large enough or lack of disk space. This > >> impacts debuggability with large core dumps since critical information > >> necessary to form a usable backtrace, such as stacks and shared library > >> information, are omitted. > >> > >> We attempted to figure out which VMAs are needed to create a useful > >> backtrace, and it turned out to be a non-trivial problem. Instead, we > >> try simply sorting the VMAs by size, which has the intended effect. > >> > >> [...] > > > > While waiting on rr test validation, and since we're at the start of the > > dev cycle, I figure let's get this into -next ASAP to see if anything > > else pops out. We can drop/revise if there are problems. (And as always, > > I will add any Acks/Reviews/etc that show up on the thread.) > > > > Applied to for-next/execve, thanks! > > > > [1/1] binfmt_elf: Dump smaller VMAs first in ELF cores > > https://urldefense.com/v3/__https://git.kernel.org/kees/c/9c531dfdc1bc__;!!NEt6yMaO-gk!FK3UfXVndoYpve8Y7q7vacIoHOrTj2nJgSJbugqUB5LfciKy0_Xvit9aXz_XCWlXHpdRQO2ArP0$ > > Thanks, Kees! And, thanks Linus + Eric for taking the time to comment on > this. > > Regarding the rr tests, it was not an easy task to get the environment > set up to do this, but I did it and was able to run the tests. The rr > tests require a lot of kernel config options and there's no list > documenting what's needed anywhere... Thanks for suffering through that! > All the tests pass except for the sioc and sioc-no-syscallbuf tests. > However, these test failures are due to an incompatibility with the > network adapter I'm using. It seems that it only likes older network > adapters. I've switched my virtualized network adapter twice now, and > each time, the test gets a bit further than the previous time. Will > continue trying different network adapters until something hopefully > works. In any case, since this error isn't directly related to my > changes and the rest of the tests pass, then I think we can be pretty > confident that this change is not breaking rr. Perfect! Okay, we'll keep our eyes open for any reports of breakage. :) -Kees
Brian Mak <makb@juniper.net> writes: > Large cores may be truncated in some scenarios, such as with daemons > with stop timeouts that are not large enough or lack of disk space. This > impacts debuggability with large core dumps since critical information > necessary to form a usable backtrace, such as stacks and shared library > information, are omitted. > > We attempted to figure out which VMAs are needed to create a useful > backtrace, and it turned out to be a non-trivial problem. Instead, we > try simply sorting the VMAs by size, which has the intended effect. > > By sorting VMAs by dump size and dumping in that order, we have a > simple, yet effective heuristic. To make finding the history easier I would include: v1: https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net v2: https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> As Kees has already picked this up this is quite possibly silly. But *shrug* that was when I was out. > Signed-off-by: Brian Mak <makb@juniper.net> > --- > > Hi all, > > Still need to run rr tests on this, per Kees Cook's suggestion, will > update back once done. GDB and readelf show that this patch works > without issue though. > > Thanks, > Brian Mak > > v3: Edited commit message to better convey alternative solution as > non-trivial > > Moved sorting logic to fs/coredump.c to make it in place > > Above edits suggested by Eric Biederman <ebiederm@xmission.com> > > v2: Edited commit message to include more reasoning for sorting VMAs > > Removed conditional VMA sorting with debugfs knob > > Above edits suggested by Eric Biederman <ebiederm@xmission.com> > > fs/coredump.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 7f12ff6ad1d3..33c5ac53ab31 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -18,6 +18,7 @@ > #include <linux/personality.h> > #include <linux/binfmts.h> > #include <linux/coredump.h> > +#include <linux/sort.h> > #include <linux/sched/coredump.h> > #include <linux/sched/signal.h> > #include <linux/sched/task_stack.h> > @@ -1191,6 +1192,18 @@ static void free_vma_snapshot(struct coredump_params *cprm) > } > } > > +static int cmp_vma_size(const void *vma_meta_lhs_ptr, const void *vma_meta_rhs_ptr) > +{ > + const struct core_vma_metadata *vma_meta_lhs = vma_meta_lhs_ptr; > + const struct core_vma_metadata *vma_meta_rhs = vma_meta_rhs_ptr; > + > + if (vma_meta_lhs->dump_size < vma_meta_rhs->dump_size) > + return -1; > + if (vma_meta_lhs->dump_size > vma_meta_rhs->dump_size) > + return 1; > + return 0; > +} > + > /* > * Under the mmap_lock, take a snapshot of relevant information about the task's > * VMAs. > @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) > cprm->vma_data_size += m->dump_size; > } > > + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), > + cmp_vma_size, NULL); > + > return true; > } > > base-commit: eb5e56d1491297e0881c95824e2050b7c205f0d4
On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote: > Brian Mak <makb@juniper.net> writes: > > > Large cores may be truncated in some scenarios, such as with daemons > > with stop timeouts that are not large enough or lack of disk space. This > > impacts debuggability with large core dumps since critical information > > necessary to form a usable backtrace, such as stacks and shared library > > information, are omitted. > > > > We attempted to figure out which VMAs are needed to create a useful > > backtrace, and it turned out to be a non-trivial problem. Instead, we > > try simply sorting the VMAs by size, which has the intended effect. > > > > By sorting VMAs by dump size and dumping in that order, we have a > > simple, yet effective heuristic. > > To make finding the history easier I would include: > v1: https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net > v2: https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> > > As Kees has already picked this up this is quite possibly silly. > But *shrug* that was when I was out. I've updated the trailers. Thanks for the review! -Kees
On Aug 12, 2024, at 11:05 AM, Kees Cook <kees@kernel.org> wrote > On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote: >> Brian Mak <makb@juniper.net> writes: >> >>> Large cores may be truncated in some scenarios, such as with daemons >>> with stop timeouts that are not large enough or lack of disk space. This >>> impacts debuggability with large core dumps since critical information >>> necessary to form a usable backtrace, such as stacks and shared library >>> information, are omitted. >>> >>> We attempted to figure out which VMAs are needed to create a useful >>> backtrace, and it turned out to be a non-trivial problem. Instead, we >>> try simply sorting the VMAs by size, which has the intended effect. >>> >>> By sorting VMAs by dump size and dumping in that order, we have a >>> simple, yet effective heuristic. >> >> To make finding the history easier I would include: >> v1: https://urldefense.com/v3/__https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0rwhadvc$ >> v2: https://urldefense.com/v3/__https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0G7RQv4o$ >> >> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> >> >> As Kees has already picked this up this is quite possibly silly. >> But *shrug* that was when I was out. > > I've updated the trailers. Thanks for the review! Hi Kees, Thanks! I think you added it to the wrong commit though. Please double check and update accordingly. Regarding the sioc tests from earlier, I've reached a point where I think I have a compatible virtual NIC (no more ioctl errors), but it's giving me some mismatched registers error, causing the test to fail. I can see this same test failure on a vanilla kernel with my setup, so this is probably either some environment issue or a bug with rr or the tests. Since all the other tests pass, I'm just going to leave it at that. Best, Brian Mak
On Mon, Aug 12, 2024 at 06:21:15PM +0000, Brian Mak wrote: > On Aug 12, 2024, at 11:05 AM, Kees Cook <kees@kernel.org> wrote > > > On Sat, Aug 10, 2024 at 07:28:44AM -0500, Eric W. Biederman wrote: > >> Brian Mak <makb@juniper.net> writes: > >> > >>> Large cores may be truncated in some scenarios, such as with daemons > >>> with stop timeouts that are not large enough or lack of disk space. This > >>> impacts debuggability with large core dumps since critical information > >>> necessary to form a usable backtrace, such as stacks and shared library > >>> information, are omitted. > >>> > >>> We attempted to figure out which VMAs are needed to create a useful > >>> backtrace, and it turned out to be a non-trivial problem. Instead, we > >>> try simply sorting the VMAs by size, which has the intended effect. > >>> > >>> By sorting VMAs by dump size and dumping in that order, we have a > >>> simple, yet effective heuristic. > >> > >> To make finding the history easier I would include: > >> v1: https://urldefense.com/v3/__https://lkml.kernel.org/r/CB8195AE-518D-44C9-9841-B2694A5C4002@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0rwhadvc$ > >> v2: https://urldefense.com/v3/__https://lkml.kernel.org/r/C21B229F-D1E6-4E44-B506-A5ED4019A9DE@juniper.net__;!!NEt6yMaO-gk!DavIB4o54KGrCPK44iq9_nJrOpKMJxUAlazBVF6lfKwmMCgLD_NviY088SQXriD19pS0G7RQv4o$ > >> > >> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> > >> As Kees has already picked this up this is quite possibly silly. > >> But *shrug* that was when I was out. > > > > I've updated the trailers. Thanks for the review! > > Hi Kees, > > Thanks! I think you added it to the wrong commit though. Ugh. Time for more coffee. Thanks; fixed. I need to update my "b4" -- it was hanging doing the trailers update so I did it myself manually... That'll teach me. ;) > tests. Since all the other tests pass, I'm just going to leave it at > that. Yeah, I think you're good. Thank you for taking the time to test rr!
diff --git a/fs/coredump.c b/fs/coredump.c index 7f12ff6ad1d3..33c5ac53ab31 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -18,6 +18,7 @@ #include <linux/personality.h> #include <linux/binfmts.h> #include <linux/coredump.h> +#include <linux/sort.h> #include <linux/sched/coredump.h> #include <linux/sched/signal.h> #include <linux/sched/task_stack.h> @@ -1191,6 +1192,18 @@ static void free_vma_snapshot(struct coredump_params *cprm) } } +static int cmp_vma_size(const void *vma_meta_lhs_ptr, const void *vma_meta_rhs_ptr) +{ + const struct core_vma_metadata *vma_meta_lhs = vma_meta_lhs_ptr; + const struct core_vma_metadata *vma_meta_rhs = vma_meta_rhs_ptr; + + if (vma_meta_lhs->dump_size < vma_meta_rhs->dump_size) + return -1; + if (vma_meta_lhs->dump_size > vma_meta_rhs->dump_size) + return 1; + return 0; +} + /* * Under the mmap_lock, take a snapshot of relevant information about the task's * VMAs. @@ -1253,5 +1266,8 @@ static bool dump_vma_snapshot(struct coredump_params *cprm) cprm->vma_data_size += m->dump_size; } + sort(cprm->vma_meta, cprm->vma_count, sizeof(*cprm->vma_meta), + cmp_vma_size, NULL); + return true; }
Large cores may be truncated in some scenarios, such as with daemons with stop timeouts that are not large enough or lack of disk space. This impacts debuggability with large core dumps since critical information necessary to form a usable backtrace, such as stacks and shared library information, are omitted. We attempted to figure out which VMAs are needed to create a useful backtrace, and it turned out to be a non-trivial problem. Instead, we try simply sorting the VMAs by size, which has the intended effect. By sorting VMAs by dump size and dumping in that order, we have a simple, yet effective heuristic. Signed-off-by: Brian Mak <makb@juniper.net> --- Hi all, Still need to run rr tests on this, per Kees Cook's suggestion, will update back once done. GDB and readelf show that this patch works without issue though. Thanks, Brian Mak v3: Edited commit message to better convey alternative solution as non-trivial Moved sorting logic to fs/coredump.c to make it in place Above edits suggested by Eric Biederman <ebiederm@xmission.com> v2: Edited commit message to include more reasoning for sorting VMAs Removed conditional VMA sorting with debugfs knob Above edits suggested by Eric Biederman <ebiederm@xmission.com> fs/coredump.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) base-commit: eb5e56d1491297e0881c95824e2050b7c205f0d4