[OPW,kernel,RFC] scripts: Detect what syscalls a userspace uses
diff mbox

Message ID 20150206160838.GA17751@winterfell
State New, archived
Headers show

Commit Message

Iulia Manda Feb. 6, 2015, 4:08 p.m. UTC
This is the first part of a patchset that should find out what syscalls a
specific userspace uses and, in the end, compile only the needed
implementations in the kernel.

It searches for values smaller that 360 (in most of the cases, a bigger
value would actually be an address) that are stored in an arch specific
register (mentioned in documentation). Then, it creates a list of numbers
corresponding to the syscalls used by that binary.

I tested this on libc builds for i386 and x86_64.

For any random userspace, lets say, a C program that only uses 3 libc wrappers,
one solution would be to just find what symbols are defined and search for
syscall wrapper calls.
e.g. callq ... <read@plt>

Signed-off-by: Iulia Manda <iulia.manda21@gmail.com>
---
 scripts/syscall_list.py |   48 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100755 scripts/syscall_list.py

Patch
diff mbox

diff --git a/scripts/syscall_list.py b/scripts/syscall_list.py
new file mode 100755
index 0000000..557bb01
--- /dev/null
+++ b/scripts/syscall_list.py
@@ -0,0 +1,48 @@ 
+#!/usr/bin/python
+
+import sys, os, re
+
+ARCH_SYSCALL_REG = {
+	'x86_64': 'eax'
+	#TODO add more
+}
+
+def print_usage():
+    # we need to know the arch in order to know in which register
+    # the number of the syscall is set:
+    sys.stderr.write("Usage: %s object_file arch\n" % sys.argv[0])
+    sys.stderr.write("Please select arch as one of the:\n")
+    for arch in ARCH_SYSCALL_REG.keys():
+        sys.stderr.write("\t%s\n" % arch)
+    sys.exit(-1)
+
+if len(sys.argv) != 3:
+    print_usage()
+
+ARCH = ARCH_SYSCALL_REG.get(sys.argv[2])
+
+if ARCH is None:
+    print_usage()
+
+def get_sys_no(file):
+    sym = []
+    lines = os.popen("objdump -lD " + file).readlines()
+    for l in lines:
+        l1 = l.strip().split()
+        if 'mov' not in l1:
+            continue
+        for e in l1:
+            # TODO use ARCH variable
+            if 'eax' in e and e.split(',')[1] == '%eax':
+                test = (e.split(',')[0])[1:].split("(")[0]
+                try:
+                    value = int(test, 16)
+                    # if value is larger, it is most probably an address
+                    if value < 360 and value not in sym:
+                        sym.append(value)
+                except Exception as e:
+                    pass
+    sym.sort()
+    print sym
+
+get_sys_no(sys.argv[1])