diff mbox

[v3] scripts: add leaking_addresses.pl

Message ID 1509945567-11801-1-git-send-email-me@tobin.cc (mailing list archive)
State New, archived
Headers show

Commit Message

Tobin Harding Nov. 6, 2017, 5:19 a.m. UTC
Currently we are leaking addresses from the kernel to user space. This
script is an attempt to find some of those leakages. Script parses
`dmesg` output and /proc and /sys files for hex strings that look like
kernel addresses.

Only works for 64 bit kernels, the reason being that kernel addresses
on 64 bit kernels have 'ffff' as the leading bit pattern making greping
possible. On 32 kernels we don't have this luxury.

Scripts is _slightly_ smarter than a straight grep, we check for false
positives (all 0's or all 1's, and vsyscall start/finish addresses).

Signed-off-by: Tobin C. Harding <me@tobin.cc>
---

v3:
 - Iterate matches to check for results instead of matching input line against
   false positives i.e catch lines that contain results as well as false
   positives. 

v2:
 - Add regex's to prevent false positives.
 - Clean up white space.

 MAINTAINERS                  |   5 +
 scripts/leaking_addresses.pl | 306 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 311 insertions(+)
 create mode 100755 scripts/leaking_addresses.pl

Comments

Linus Torvalds Nov. 6, 2017, 5:27 p.m. UTC | #1
On Sun, Nov 5, 2017 at 9:19 PM, Tobin C. Harding <me@tobin.cc> wrote:
> Currently we are leaking addresses from the kernel to user space. This
> script is an attempt to find some of those leakages. Script parses
> `dmesg` output and /proc and /sys files for hex strings that look like
> kernel addresses.

Lovely. This is great. It shows just how much totally pointless stuff
we leak, and to normal users that really shouldn't need it.

I had planned to wait for 4.15 to look at the printk hashing stuff
etc, but this part I think I could/should merge early just because I
think a lot of kernel developers will go "Why the f*ck would we expose
that kernel address there?"

The module sections stuff etc should likely be obviously root-only,
although maybe I'm missing some tool that ends up using it and is
useful to normal developers.

And I'm thinking we could make kallsyms smarter too, and instead of
depending on kptr_restrict that screws over things with much too big a
hammer, we could make it take 'perf_event_paranoid' into account. I
suspect that's the main user of kallsyms that would still be relevant
to non-root.

I really really hate kptr_restrict due to that whole "all or nothing" thing.

The networking code ends up having a ton of "sk" pointers, and I
suspect those would all be fine just using the hashed address, since
nobody is going to care about the _actual_ kernel address, but
matching a socket reference in one file against another one makes tons
of sense and I would not be surprised if there's a lot of netstat-like
tools that do that. So the whole '%pK' thing is entirely useless for
them, but the "hash %p values" should work fine.

Adding netdev and David Miller explicitly to the cc, since many of the
kernel pointer leaks are from them, and they may not even have been
following this discussion at all.

David - you can see the patch on patchwork:

    https://patchwork.kernel.org/patch/10042605/

and try it out yourself.

I heartily recommend other developers run it too, just to see if it
makes you go "Oh, I didn't even realize.."

This looks much more interesting than the whole "%p vs %pK vs %x" discussion.

                  Linus
Linus Torvalds Nov. 6, 2017, 5:41 p.m. UTC | #2
On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Lovely. This is great. It shows just how much totally pointless stuff
> we leak, and to normal users that really shouldn't need it.

Side note: it would be good to have some summary view, and perhaps
some way to limit duplicates.

I ended up running this command line from hell to summarize the
different sources:

    perl leaking_addresses.pl |
            cut -d: -f1 |
            sed 's:/[0-9]*/:/X/:g' |
            sed 's:/module/[^/]*/:/module/X/:g' |
            sort | uniq | less -S

and maybe that kind of duplicate culling could be part of the script
itself if you pass it some summary line.

In particular, if would be nice to have a summary report that

 - only shows the first address for a particular source

 - have some logic to collapse repeated entries of "same file, just
different instance"

my sed-invocations there are obviously very ad-hoc, I'm  not actually
advocating that crap, it's only meant as hacky example of what I'm
talking about. Something smarter would be much better.

Because right now if some developer runs it, they might miss some case
that they should care about, simply because it's hidden among all the
thousands of essentially duplicate cases.

                 Linus
Pavel Vasilyev Nov. 6, 2017, 6:25 p.m. UTC | #3
./leaking_addresses.pl --dont_walk_abs /proc --dont_walk_abs /sys
Unknown option: dont_walk_abs
Unknown option: dont_walk_abs

06.11.2017 20:27, Linus Torvalds пишет:
> David - you can see the patch on patchwork:
>
>     https://patchwork.kernel.org/patch/10042605/
>
> and try it out yourself.
>
Tobin Harding Nov. 6, 2017, 9:03 p.m. UTC | #4
On Mon, Nov 06, 2017 at 09:25:33PM +0300, Pavel Vasilyev wrote:
>  ./leaking_addresses.pl --dont_walk_abs /proc --dont_walk_abs /sys
> Unknown option: dont_walk_abs
> Unknown option: dont_walk_abs

Oh thanks. Documentation is out of sync with the code, what are the odds.

v4 to come.

thanks,
Tobin.
Tobin Harding Nov. 6, 2017, 9:15 p.m. UTC | #5
On Mon, Nov 06, 2017 at 09:41:09AM -0800, Linus Torvalds wrote:
> On Mon, Nov 6, 2017 at 9:27 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Lovely. This is great. It shows just how much totally pointless stuff
> > we leak, and to normal users that really shouldn't need it.
> 
> Side note: it would be good to have some summary view, and perhaps
> some way to limit duplicates.

This has been bothering me also.

> I ended up running this command line from hell to summarize the
> different sources:
> 
>     perl leaking_addresses.pl |
>             cut -d: -f1 |
>             sed 's:/[0-9]*/:/X/:g' |
>             sed 's:/module/[^/]*/:/module/X/:g' |
>             sort | uniq | less -S
> 
> and maybe that kind of duplicate culling could be part of the script
> itself if you pass it some summary line.
> 
> In particular, if would be nice to have a summary report that
> 
>  - only shows the first address for a particular source
> 
>  - have some logic to collapse repeated entries of "same file, just
> different instance"
> 
> my sed-invocations there are obviously very ad-hoc, I'm  not actually
> advocating that crap, it's only meant as hacky example of what I'm
> talking about. Something smarter would be much better.
> 
> Because right now if some developer runs it, they might miss some case
> that they should care about, simply because it's hidden among all the
> thousands of essentially duplicate cases.

Awesome. I'm on it. thanks.

So, cull duplicates by default, add summary report to end of output, add
'--raw' option to dump all the lines (the current output).

thanks,
Tobin.
diff mbox

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index e652a3e2929d..c1ad6d133a57 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7745,6 +7745,11 @@  S:	Maintained
 F:	Documentation/scsi/53c700.txt
 F:	drivers/scsi/53c700*
 
+LEAKING_ADDRESSES
+M:	Tobin C. Harding <me@tobin.cc>
+S:	Maintained
+F:	scripts/leaking_addresses.pl
+
 LED SUBSYSTEM
 M:	Richard Purdie <rpurdie@rpsys.net>
 M:	Jacek Anaszewski <jacek.anaszewski@gmail.com>
diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl
new file mode 100755
index 000000000000..1487a75cfc52
--- /dev/null
+++ b/scripts/leaking_addresses.pl
@@ -0,0 +1,306 @@ 
+#!/usr/bin/env perl
+#
+# (c) 2017 Tobin C. Harding <me@tobin.cc>
+# Licensed under the terms of the GNU GPL License version 2
+#
+# leaking_addresses.pl: Scan 64 bit kernel for potential leaking addresses.
+#  - Scans dmesg output.
+#  - Walks directory tree and parses each file (for each directory in @DIRS).
+#
+# You can configure the behaviour of the script;
+#
+#  - By adding paths, for directories you do not want to walk;
+#     absolute paths: @skip_walk_dirs_abs
+#     directory names: @skip_walk_dirs_any
+#
+#  - By adding paths, for files you do not want to parse;
+#     absolute paths: @skip_parse_files_abs
+#     file names: @skip_parse_files_any
+#
+# The use of @skip_xxx_xxx_any causes files to be skipped where ever they occur.
+# For example adding 'fd' to @skip_walk_dirs_any causes the fd/ directory to be
+# skipped for all PID sub-directories of /proc
+#
+# The same thing can be achieved by passing command line options to --dont-walk
+# and --dont-parse. If absolute paths are supplied to these options they are
+# appended to the @skip_xxx_xxx_abs arrays. If file names are supplied to these
+# options, they are appended to the @skip_xxx_xxx_any arrays.
+#
+# Use --debug to output path before parsing, this is useful to find files that
+# cause the script to choke.
+#
+# You may like to set kptr_restrict=2 before running script
+# (see Documentation/sysctl/kernel.txt).
+
+use warnings;
+use strict;
+use POSIX;
+use File::Basename;
+use File::Spec;
+use Cwd 'abs_path';
+use Term::ANSIColor qw(:constants);
+use Getopt::Long qw(:config no_auto_abbrev);
+
+my $P = $0;
+my $V = '0.01';
+
+# Directories to scan.
+my @DIRS = ('/proc', '/sys');
+
+# Command line options.
+my $help = 0;
+my $debug = 0;
+my @dont_walk = ();
+my @dont_parse = ();
+
+# Do not parse these files (absolute path).
+my @skip_parse_files_abs = ('/proc/kmsg',
+			    '/proc/kcore',
+			    '/proc/fs/ext4/sdb1/mb_groups',
+			    '/proc/1/fd/3',
+			    '/sys/kernel/debug/tracing/trace_pipe',
+			    '/sys/kernel/security/apparmor/revision');
+
+# Do not parse thes files under any subdirectory.
+my @skip_parse_files_any = ('0',
+			    '1',
+			    '2',
+			    'pagemap',
+			    'events',
+			    'access',
+			    'registers',
+			    'snapshot_raw',
+			    'trace_pipe_raw',
+			    'ptmx',
+			    'trace_pipe');
+
+# Do not walk these directories (absolute path).
+my @skip_walk_dirs_abs = ();
+
+# Do not walk these directories under any subdirectory.
+my @skip_walk_dirs_any = ('self',
+			  'thread-self',
+			  'cwd',
+			  'fd',
+			  'stderr',
+			  'stdin',
+			  'stdout');
+
+sub help
+{
+	my ($exitcode) = @_;
+
+	print << "EOM";
+Usage: $P [OPTIONS]
+Version: $V
+
+Options:
+
+	--dont-walk=<dir>      Don't walk tree starting at <dir>.
+	--dont-parse=<file>    Don't parse <file>.
+	-d, --debug                Display debugging output.
+	-h, --help, --version      Display this help and exit.
+
+If an absolute path is passed to --dont_XXX then this path is skipped. If a
+single filename is passed then this file/directory will be skipped when
+appearing under any subdirectory.
+
+Example:
+
+	# Just scan dmesg output.
+	scripts/leaking_addresses.pl --dont_walk_abs /proc --dont_walk_abs /sys
+
+Scans the running (64 bit) kernel for potential leaking addresses.
+
+EOM
+	exit($exitcode);
+}
+
+GetOptions(
+	'dont-walk=s'		=> \@dont_walk,
+	'dont-parse=s'		=> \@dont_parse,
+	'd|debug'		=> \$debug,
+	'h|help'		=> \$help,
+	'version'		=> \$help
+) or help(1);
+
+help(0) if ($help);
+
+push_to_global();
+
+parse_dmesg();
+walk(@DIRS);
+
+exit 0;
+
+sub debug_arrays
+{
+	print 'dirs_any: ' . join(", ", @skip_walk_dirs_any) . "\n";
+	print 'dirs_abs: ' . join(", ", @skip_walk_dirs_abs) . "\n";
+	print 'parse_any: ' . join(", ", @skip_parse_files_any) . "\n";
+	print 'parse_abs: ' . join(", ", @skip_parse_files_abs) . "\n";
+}
+
+sub dprint
+{
+	printf(STDERR @_) if $debug;
+}
+
+sub push_in_abs_any
+{
+	my ($in, $abs, $any) = @_;
+
+	foreach my $path (@$in) {
+		if (File::Spec->file_name_is_absolute($path)) {
+			push @$abs, $path;
+		} elsif (index($path,'/') == -1) {
+			push @$any, $path;
+		} else {
+			print 'path error: ' . $path;
+		}
+	}
+}
+
+# Push command line options to global arrays.
+sub push_to_global
+{
+	push_in_abs_any(\@dont_walk, \@skip_walk_dirs_abs, \@skip_walk_dirs_any);
+	push_in_abs_any(\@dont_parse, \@skip_parse_files_abs, \@skip_parse_files_any);
+}
+
+sub is_false_positive
+{
+        my ($match) = @_;
+
+        if ($match =~ '\b(0x)?(f|F){16}\b' or
+            $match =~ '\b(0x)?0{16}\b') {
+                return 1;
+        }
+
+        # vsyscall memory region, we should probably check against a range here.
+        if ($match =~ '\bf{10}600000\b' or
+            $match =~ '\bf{10}601000\b') {
+                return 1;
+        }
+
+        return 0;
+}
+
+# True if argument potentially contains a kernel address.
+sub may_leak_address
+{
+        my ($line) = @_;
+        my $address = '\b(0x)?ffff[[:xdigit:]]{12}\b';
+
+        # Signal masks.
+        if ($line =~ '^SigBlk:' or
+            $line =~ '^SigCgt:') {
+                return 0;
+        }
+
+        if ($line =~ '\bKEY=[[:xdigit:]]{14} [[:xdigit:]]{16} [[:xdigit:]]{16}\b' or
+            $line =~ '\b[[:xdigit:]]{14} [[:xdigit:]]{16} [[:xdigit:]]{16}\b') {
+		return 0;
+        }
+
+        while (/($address)/g) {
+                if (!is_false_positive($1)) {
+                        return 1;
+                }
+        }
+
+        return 0;
+}
+
+sub parse_dmesg
+{
+	open my $cmd, '-|', 'dmesg';
+	while (<$cmd>) {
+		if (may_leak_address($_)) {
+			print 'dmesg: ' . $_;
+		}
+	}
+	close $cmd;
+}
+
+# True if we should skip this path.
+sub skip
+{
+	my ($path, $paths_abs, $paths_any) = @_;
+
+	foreach (@$paths_abs) {
+		return 1 if (/^$path$/);
+	}
+
+	my($filename, $dirs, $suffix) = fileparse($path);
+	foreach (@$paths_any) {
+		return 1 if (/^$filename$/);
+	}
+
+	return 0;
+}
+
+sub skip_parse
+{
+	my ($path) = @_;
+	return skip($path, \@skip_parse_files_abs, \@skip_parse_files_any);
+}
+
+sub parse_file
+{
+	my ($file) = @_;
+
+	if (! -R $file) {
+		return;
+	}
+
+	if (skip_parse($file)) {
+		dprint "skipping file: $file\n";
+		return;
+	}
+	dprint "parsing: $file\n";
+
+	open my $fh, "<", $file or return;
+	while ( <$fh> ) {
+		if (may_leak_address($_)) {
+			print $file . ': ' . $_;
+		}
+	}
+	close $fh;
+}
+
+
+# True if we should skip walking this directory.
+sub skip_walk
+{
+	my ($path) = @_;
+	return skip($path, \@skip_walk_dirs_abs, \@skip_walk_dirs_any)
+}
+
+# Recursively walk directory tree.
+sub walk
+{
+	my @dirs = @_;
+	my %seen;
+
+	while (my $pwd = shift @dirs) {
+		next if (skip_walk($pwd));
+		next if (!opendir(DIR, $pwd));
+		my @files = readdir(DIR);
+		closedir(DIR);
+
+		foreach my $file (@files) {
+			next if ($file eq '.' or $file eq '..');
+
+			my $path = "$pwd/$file";
+			next if (-l $path);
+
+			if (-d $path) {
+				push @dirs, $path;
+			} else {
+				parse_file($path);
+			}
+		}
+	}
+}
+