[OPW,kernel] scripts: Compile out syscalls given a specific userspace
diff mbox

Message ID 20150223164108.GA7647@winterfell
State New, archived
Headers show

Commit Message

Iulia Manda Feb. 23, 2015, 4:41 p.m. UTC
This patch suggests which syscalls can be compiled out in the kernel given a
specific userspace, by mapping each syscall with its corresponding symbol(s)
and deciding which of them can be disabled.

The steps taken in the script are the following:

1. Get the list of syscalls a userspace uses (nm) - this will give us more
symbols than those that match syscalls, but the next step will filter them
out;
2. Intersect that list with the list of all optional syscalls (check-syscalls
script that finds what syscalls can be compiled out in kernel/sys_ni.c) => we
will obtain a list containing all the optional syscalls that we can compile
out;
3. Parse C files and Makefiles in the kernel source code in order to map each
syscall with the symbols that compile it out:
- we need a stack in order to know between which ifdef and endif a syscall is
defined;
- we keep a dictionary where the key is the syscall and the values are all the
symbols that it depends on and the conditionals between them;
4. The output will be a list of symbols that can be disabled, and the
corresponding list of those syscalls that need to be enabled in order for the
application to work.

In case of uncertainty (e.g: compound conditionals), it choses to enable all
the symbols that syscall depends on.

On a short note, it provides with correct solutions, not necessarily the
optimal one yet (for example, in case of a disjunction, both symbols are set
to True, even though only one is needed in order for the syscall to be
compiled in).

You can run the script as follows:

compile_syscalls.py object_file syscalls-optional \
        `find staging/ -name "*.c"` > output

Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
---
 scripts/compile_syscalls.py |  194 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)
 create mode 100755 scripts/compile_syscalls.py

Comments

Josh Triplett March 5, 2015, 4:36 p.m. UTC | #1
On Mon, Feb 23, 2015 at 06:41:08PM +0200, Iulia Manda wrote:
> This patch suggests which syscalls can be compiled out in the kernel given a
> specific userspace, by mapping each syscall with its corresponding symbol(s)
> and deciding which of them can be disabled.
> 
> The steps taken in the script are the following:
> 
> 1. Get the list of syscalls a userspace uses (nm) - this will give us more
> symbols than those that match syscalls, but the next step will filter them
> out;
> 2. Intersect that list with the list of all optional syscalls (check-syscalls
> script that finds what syscalls can be compiled out in kernel/sys_ni.c) => we
> will obtain a list containing all the optional syscalls that we can compile
> out;
> 3. Parse C files and Makefiles in the kernel source code in order to map each
> syscall with the symbols that compile it out:
> - we need a stack in order to know between which ifdef and endif a syscall is
> defined;
> - we keep a dictionary where the key is the syscall and the values are all the
> symbols that it depends on and the conditionals between them;
> 4. The output will be a list of symbols that can be disabled, and the
> corresponding list of those syscalls that need to be enabled in order for the
> application to work.
> 
> In case of uncertainty (e.g: compound conditionals), it choses to enable all
> the symbols that syscall depends on.
> 
> On a short note, it provides with correct solutions, not necessarily the
> optimal one yet (for example, in case of a disjunction, both symbols are set
> to True, even though only one is needed in order for the syscall to be
> compiled in).
> 
> You can run the script as follows:
> 
> compile_syscalls.py object_file syscalls-optional \
>         `find staging/ -name "*.c"` > output

You may want to do this in two steps: first go over all the C files and
generate a data file with syscalls and corresponding conditionals, and
then have a separate script that reads that data file and an object file
and generates the config snippet.  That way, you only do the source
analysis once, and you can hand-edit the results if you need to clean
them up.

> Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
> ---
>  scripts/compile_syscalls.py |  194 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 194 insertions(+)
>  create mode 100755 scripts/compile_syscalls.py
> 
> diff --git a/scripts/compile_syscalls.py b/scripts/compile_syscalls.py
> new file mode 100755
> index 0000000..e573686
> --- /dev/null
> +++ b/scripts/compile_syscalls.py
> @@ -0,0 +1,194 @@
> +#!/usr/bin/python
> +
> +import re, sys, os, fileinput

You don't appear to be using fileinput here.

> +import pprint

You're not using pprint except for some commented-out debugging output;
you could import it next to that debugging output, also commented out.

> +
> +if len(sys.argv) < 3:
> +    sys.stderr.write("usage: %s object_file syscalls-optional source_files\n"
> +                        % sys.argv[0])
> +    sys.exit(-1)

Conventionally, Python scripts define a function "main(args)", and then at the
bottom of the script, they have:

if __name__ == "__main__":
    sys.exit(main(sys.argv))

You can then refer to main's argument instead of sys.argv, and use
return instead of sys.exit.

Also, -1 isn't a valid program return code; return codes go from 0-255,
with 0 being success.  I'd suggest using 1 instead.

> +
> +# Find what syscalls a userspace uses
> +def get_userspace_syscalls(file):

Since this one just gets all glibc symbol names, I'd suggest calling it
"get symbols"; later on you'll determine whether they're present in your
list of syscalls.

> +    sym = []
> +    lines = iter(os.popen("nm " + file).readlines())

You don't need to call readlines() and then call iter() on the result.
You can just do "for l in os.popen(...):".  Iteration by line is the
default behavior you get if you iterate over a file or file-like object.

Also, nm works on an object file, but if you ran this on an executable,
you'd need "nm -D" instead.  I don't know how to handle both cases
transparently, other than trying one first, and if you see a line ending
in "no symbols" as you loop through, trying the other one.  Or just
always running both.

Finally, you can pass --undefined-only to nm to only show the symbols
that the object or program wants from some other library, rather than
the symbols the object or program defines.

> +    for l in lines:
> +        if not '@@GLIBC' in l:

Python has a "not in" operator, which allows you to write this more
naturally as:

if "@@GLIBC" not in l:

> +            continue
> +        words = l.split()
> +        for e in words:

You can just use "for e in l.split():".

Or, better yet, since the output of nm with --undefined-only should
always consist of two columns, the first just containing "U", and the
second containing a symbol name, you can just write:

t, n = l.split()

Then check if t == 'U', and if so, check n, without looping.

> +            if '@@GLIBC' in e:
> +                sym.append(re.split("@@GLIBC", e)[0])
> +                break
> +    return sym
> +
> +
> +# Find which syscalls from userspace can be optionally compiled in the kernel
> +def get_optional_syscalls(file):
> +    cnf = []
> +    # Run this on the object file of the application
> +    sym = get_userspace_syscalls(sys.argv[1])
> +    for e in sym:
> +        with open(file) as f:
> +            lines = f.read().splitlines()
> +            i = "sys_" + e 
> +            if i in lines:
> +                cnf.append(e)     
> +    return cnf

This function is reading every line of f for every symbol.

Instead, I'd suggest reading the file once, putting the results into a
dictionary, and looking up each of the smbols in the dictionary.

On top of that, get_optional_syscalls shouldn't be looking at
sys.argv[1] or calling get_userspace_syscalls; you should process the
optional syscall list and return a dictionary, and the caller can
evaluate the userspace syscalls against that dictionary.

Also, a few lines there have trailing spaces.  You should check for
those in the whole file, and drop any trailing spaces.

> +
> + 
> +def c_to_o(file):
> +    f = re.split('/', file)[-1]

This is os.path.basename(file).

> +    name, ext = os.path.splitext(f)
> +    return " " + name + ".o"

Prepending a space seems very odd here.  If that's needed as part of the
search in the makefile, the caller should do that.  More importantly,
including that space will miss some cases, since it's perfectly legal to
write:

obj-$(CONFIG_SOMETHING):=something.o

without spaces.

> +
> +
> +def add_to_dictionary(dict1, key, value):
> +    if key in dict1:
> +        dict1[key].extend(value)
> +    else:
> +        dict1[key] = value

There's a simpler pattern for this:

dict1.setdefault(key, []).extend(value)

setdefault looks up the key in the dictionary, sets it to the passed
default value if not already set, and then returns it.

That's a common enough pattern that I'd suggest just inlining it into
the caller rather than having a function for it.

Also, be careful about the difference between "append" and "extend".
"extend" adds each item in an iterable to the list, while "append" adds
its argument to the list.  And since a string is iterable, if you pass
it to "extend" you add all the characters rather than the whole string.
If you're adding a single value, I'd suggest using append and passing
that value, rather than using extend and passing the single value
wrapped in a list.  (On the other hand, if you actually do have a list
of values to add, definitely use extend.)

> +    
> +def get_syscall_name(line):
> +    name = re.split('[(,)]', line)[1]
> +    return name
> +
> +
> +def get_ifdef_symbols(line):
> +    name = line.split()
> +    return [name[1]]
> +
> +
> +def get_defined_symbols(line):
> +    delim = ['#if', 'defined ', 'defined(', ')', '&&', '||', '>=']
> +    for d in delim:
> +        line = line.replace(d, '')
> +    line = line.split()
> +    return line 
> +
> +class Node:
> +    def __init__(self, parent=None, name=""):
> +        self.parent = parent
> +        self.name = name
> +        self.children = []
> +
> +    def add_child(self, el):
> +        self.children.append(el)
> +
> +
> +# Check the Makefile in order to see if a file containing a syscall
> +# is compiled out as a whole
> +curr = Node()
> +map_sys = {}
> +def parse_makefile(file):
> +    global curr
> +    sys_list = []
> +    with open(file) as f:
> +        lines = f.read().splitlines()
> +        for l in lines:
> +            if re.search("^SYSCALL_DEFINE", l) or \
> +                re.search("^COMPAT_SYSCALL_DEFINE", l):
> +                sys_list.append(get_syscall_name(l)) 
> +    if sys_list == []:
> +        return
> +    search_for = c_to_o(file)
> +    try:
> +        f = open(os.path.dirname(file) + "/Makefile")
> +        lines = f.read().replace('\\\n', '').splitlines()

Nice.  This took me a minute to understand, but it looks like you're
handling lines that end in a backslash and thus get continued in the
next line.  This deserves a comment.

> +        yes = '\n'.join([l for l in lines if search_for in l])
> +        if re.search('.*-\$\(CONFIG.*\)', yes):
> +            value = re.split('[$()]', yes)[2]
> +            for e in sys_list:
> +                if not e in map_sys:
> +                    map_sys[e] = []
> +                map_sys[e].append(value)

This is that same pattern mentioned above, where you can just use
setdefault.  Also, since sys_list is a list, you can use extend rather
than a loop over calls to append.

> +        # Check if a file is compiled under ifdefs
> +        for l in lines:
> +            if re.search('^ifdef', l):

As mentioned on our previous call, you may also need to handle certain
cases of "ifeq" or "ifneq".  At least, if any syscalls depend on those;
if not, don't worry about it.

> +                name = get_ifdef_symbols(l)
> +                new = Node(parent=curr, name=name)
> +                curr.add_child(new)
> +                curr = new
> +            elif search_for in l:
> +                if curr.name:
> +                    for e in sys_list:
> +                        add_to_dictionary(map_sys, e, curr.name)
> +            elif re.search('^endif', l):
> +                if curr.parent is not None:
> +                    curr = curr.parent
> +    except:
> +        pass

There's potentially one additional case you might have to handle: a
Makefile in a higher-level directory might potentially configure out the
entire directory.  I'm not sure if there are any cases of that happening
for syscalls, though.

Also, the Node mechanism may be more complex than you need; it looks
like you only ever have a single child for each node, so instead of a
tree-like structure, you can just use a stack (implemented as a list).

> +
> +
> +curr = Node()
> +def parse_line(line):
> +    global curr
> +    if re.search("^#ifdef",line):
> +        name = get_ifdef_symbols(line)
> +        new = Node(parent=curr, name=name)
> +        curr.add_child(new)
> +        curr = new
> +    elif re.search("^SYSCALL_DEFINE", line) or \
> +            re.search("^COMPAT_SYSCALL_DEFINE", line):
> +        syscall_name = get_syscall_name(line)
> +        if curr.name:
> +            add_to_dictionary(map_sys, syscall_name, curr.name)
> +    elif re.search('^#endif', line):
> +        if curr.parent is not None:
> +            curr = curr.parent
> +    elif (re.search('^#if', line)) and ('defined' not in line):
> +        new = Node(parent=curr)
> +        curr = new
> +    elif re.search("^#if defined", line):
> +        name = get_defined_symbols(line)
> +        new = Node(parent=curr, name=name)
> +        curr.add_child(new)
> +        curr = new
> +        
> +
> +def parse_files():
> +    for n in sys.argv[3:]:
> +        with open(n) as f:
> +            # need to compact lines that contain the same info
> +            lines = f.read().replace('\\\n', '').splitlines()
> +            for l in lines:
> +                parse_line(l)
> +        parse_makefile(n)
> +
> +parse_files()
> +# One can use pprint in order to see the intermediate output
> +# more human-readable :)
> +# pprint.pprint(map_sys)
> +# print "\n"
> +
> +
> +# At first, we set all symbols to False (no symbol is enabled)
> +bool_dict = {}
> +for k,v in map_sys.iteritems():
> +    for e in v:
> +        bool_dict[e] = False
> +
> +
> +def enable_symbol():
> +    cnf = get_optional_syscalls(sys.argv[2])
> +    for e in cnf:
> +        if e not in map_sys:
> +            continue
> +        for sym in map_sys[e]:
> +            bool_dict[sym] = True
> +            
> +enable_symbol()
> +
> +print "\n"
> +print "You can disable the following symbols:\n"
> +for k,v in bool_dict.iteritems():
> +    if v is False:
> +        print k
> +print "\n"
> +
> +print "The following symbols have to be enabled:\n"
> +for k,v in bool_dict.iteritems():
> +    if v is True:
> +        print k
> -- 
> 1.7.10.4
> 
> -- 
> You received this message because you are subscribed to the Google Groups "opw-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opw-kernel+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Patch
diff mbox

diff --git a/scripts/compile_syscalls.py b/scripts/compile_syscalls.py
new file mode 100755
index 0000000..e573686
--- /dev/null
+++ b/scripts/compile_syscalls.py
@@ -0,0 +1,194 @@ 
+#!/usr/bin/python
+
+import re, sys, os, fileinput
+import pprint
+
+if len(sys.argv) < 3:
+    sys.stderr.write("usage: %s object_file syscalls-optional source_files\n"
+                        % sys.argv[0])
+    sys.exit(-1)
+
+
+# Find what syscalls a userspace uses
+def get_userspace_syscalls(file):
+    sym = []
+    lines = iter(os.popen("nm " + file).readlines())
+    for l in lines:
+        if not '@@GLIBC' in l:
+            continue
+        words = l.split()
+        for e in words:
+            if '@@GLIBC' in e:
+                sym.append(re.split("@@GLIBC", e)[0])
+                break
+    return sym
+
+
+# Find which syscalls from userspace can be optionally compiled in the kernel
+def get_optional_syscalls(file):
+    cnf = []
+    # Run this on the object file of the application
+    sym = get_userspace_syscalls(sys.argv[1])
+    for e in sym:
+        with open(file) as f:
+            lines = f.read().splitlines()
+            i = "sys_" + e 
+            if i in lines:
+                cnf.append(e)     
+    return cnf
+
+
+ 
+def c_to_o(file):
+    f = re.split('/', file)[-1]
+    name, ext = os.path.splitext(f)
+    return " " + name + ".o"
+
+
+def add_to_dictionary(dict1, key, value):
+    if key in dict1:
+        dict1[key].extend(value)
+    else:
+        dict1[key] = value
+    
+def get_syscall_name(line):
+    name = re.split('[(,)]', line)[1]
+    return name
+
+
+def get_ifdef_symbols(line):
+    name = line.split()
+    return [name[1]]
+
+
+def get_defined_symbols(line):
+    delim = ['#if', 'defined ', 'defined(', ')', '&&', '||', '>=']
+    for d in delim:
+        line = line.replace(d, '')
+    line = line.split()
+    return line 
+
+class Node:
+    def __init__(self, parent=None, name=""):
+        self.parent = parent
+        self.name = name
+        self.children = []
+
+    def add_child(self, el):
+        self.children.append(el)
+
+
+# Check the Makefile in order to see if a file containing a syscall
+# is compiled out as a whole
+curr = Node()
+map_sys = {}
+def parse_makefile(file):
+    global curr
+    sys_list = []
+    with open(file) as f:
+        lines = f.read().splitlines()
+        for l in lines:
+            if re.search("^SYSCALL_DEFINE", l) or \
+                re.search("^COMPAT_SYSCALL_DEFINE", l):
+                sys_list.append(get_syscall_name(l)) 
+    if sys_list == []:
+        return
+    search_for = c_to_o(file)
+    try:
+        f = open(os.path.dirname(file) + "/Makefile")
+        lines = f.read().replace('\\\n', '').splitlines()
+        yes = '\n'.join([l for l in lines if search_for in l])
+        if re.search('.*-\$\(CONFIG.*\)', yes):
+            value = re.split('[$()]', yes)[2]
+            for e in sys_list:
+                if not e in map_sys:
+                    map_sys[e] = []
+                map_sys[e].append(value)
+        # Check if a file is compiled under ifdefs
+        for l in lines:
+            if re.search('^ifdef', l):
+                name = get_ifdef_symbols(l)
+                new = Node(parent=curr, name=name)
+                curr.add_child(new)
+                curr = new
+            elif search_for in l:
+                if curr.name:
+                    for e in sys_list:
+                        add_to_dictionary(map_sys, e, curr.name)
+            elif re.search('^endif', l):
+                if curr.parent is not None:
+                    curr = curr.parent
+    except:
+        pass
+
+
+curr = Node()
+def parse_line(line):
+    global curr
+    if re.search("^#ifdef",line):
+        name = get_ifdef_symbols(line)
+        new = Node(parent=curr, name=name)
+        curr.add_child(new)
+        curr = new
+    elif re.search("^SYSCALL_DEFINE", line) or \
+            re.search("^COMPAT_SYSCALL_DEFINE", line):
+        syscall_name = get_syscall_name(line)
+        if curr.name:
+            add_to_dictionary(map_sys, syscall_name, curr.name)
+    elif re.search('^#endif', line):
+        if curr.parent is not None:
+            curr = curr.parent
+    elif (re.search('^#if', line)) and ('defined' not in line):
+        new = Node(parent=curr)
+        curr = new
+    elif re.search("^#if defined", line):
+        name = get_defined_symbols(line)
+        new = Node(parent=curr, name=name)
+        curr.add_child(new)
+        curr = new
+        
+
+def parse_files():
+    for n in sys.argv[3:]:
+        with open(n) as f:
+            # need to compact lines that contain the same info
+            lines = f.read().replace('\\\n', '').splitlines()
+            for l in lines:
+                parse_line(l)
+        parse_makefile(n)
+
+parse_files()
+# One can use pprint in order to see the intermediate output
+# more human-readable :)
+# pprint.pprint(map_sys)
+# print "\n"
+
+
+# At first, we set all symbols to False (no symbol is enabled)
+bool_dict = {}
+for k,v in map_sys.iteritems():
+    for e in v:
+        bool_dict[e] = False
+
+
+def enable_symbol():
+    cnf = get_optional_syscalls(sys.argv[2])
+    for e in cnf:
+        if e not in map_sys:
+            continue
+        for sym in map_sys[e]:
+            bool_dict[sym] = True
+            
+enable_symbol()
+
+print "\n"
+print "You can disable the following symbols:\n"
+for k,v in bool_dict.iteritems():
+    if v is False:
+        print k
+print "\n"
+
+print "The following symbols have to be enabled:\n"
+for k,v in bool_dict.iteritems():
+    if v is True:
+        print k