From patchwork Wed Dec 13 06:22:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10109269 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CB40560352 for ; Wed, 13 Dec 2017 06:23:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C105628AA9 for ; Wed, 13 Dec 2017 06:23:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B598D28ACD; Wed, 13 Dec 2017 06:23:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D316528AA9 for ; Wed, 13 Dec 2017 06:23:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751042AbdLMGXK (ORCPT ); Wed, 13 Dec 2017 01:23:10 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:45296 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751026AbdLMGXJ (ORCPT ); Wed, 13 Dec 2017 01:23:09 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vBD6MfQp039101; Wed, 13 Dec 2017 06:22:41 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=sGCPTXO7Vd6lsCfxz97ShGH5B9Zz4WxBMLlC3vN/P+A=; b=jyLN02+JanTdVWh5rLkSRggV0wtWLDgGoqrX2ZYH7d5bYkr1cskZKk6NJS26tR25TJj9 sERwdzXCNwgaH/ZaOkIlQEhVGrat3sYTe3yr1SgUjXQGQpsvEras48YOoErQpwj88i85 L2fQ+WjIIuRmDWsgwu1hiMXAgC6KoAq2BXyWoRbfBe5KW1HZVFDCJ9xgLkhkjOLCy9NF RrQ3aRO/tGkD0CYHFS9J7RJFmF1gj+YH2kn6gPqYtGcfZgGi78erT9lE5qraE8HiKYCw HaTaWI1M5qvyuKIsMpE7E1BCruljRGP3CVMHFUkmp7CTm9QJxEuM3HVpn+8xJokKVjzW fg== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2etxutg06y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Dec 2017 06:22:40 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id vBD6MeA1023615 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Dec 2017 06:22:40 GMT Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id vBD6Mdim028340; Wed, 13 Dec 2017 06:22:39 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 12 Dec 2017 22:22:39 -0800 Date: Tue, 12 Dec 2017 22:22:38 -0800 From: "Darrick J. Wong" To: linux-xfs@vger.kernel.org Cc: Richard Wareing , david@fromorbit.com, hch@infradead.org Subject: [PATCH 2/2] tools/xfs: use XFS hacks to override data block device placement Message-ID: <20171213062238.GQ19219@magnolia> References: <20171213061825.GO19219@magnolia> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20171213061825.GO19219@magnolia> User-Agent: Mutt/1.5.24 (2015-08-30) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8743 signatures=668646 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=4 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712130091 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong This (bcc) patch modifies bcc so that we can override some function return values. We then create a new python script containing custom logic to decide where a file's data goes (rtdev or datadev) and inject the compiled eBPF code into the kernel. Signed-off-by: Darrick J. Wong --- src/cc/compat/linux/bpf.h | 7 ++ src/cc/compat/linux/virtual_bpf.h | 3 + src/cc/export/helpers.h | 2 + tools/xfs_rt.py | 130 +++++++++++++++++++++++++++++++++++++ 4 files changed, 140 insertions(+), 2 deletions(-) create mode 100755 tools/xfs_rt.py -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/src/cc/compat/linux/bpf.h b/src/cc/compat/linux/bpf.h index f896897..5a3ec0b 100644 --- a/src/cc/compat/linux/bpf.h +++ b/src/cc/compat/linux/bpf.h @@ -677,6 +677,10 @@ union bpf_attr { * @buf: buf to fill * @buf_size: size of the buf * Return : 0 on success or negative error code + * + * int bpf_override_return(pt_regs, rc) + * @pt_regs: pointer to struct pt_regs + * @rc: the return value to set */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -736,7 +740,8 @@ union bpf_attr { FN(xdp_adjust_meta), \ FN(perf_event_read_value), \ FN(perf_prog_read_value), \ - FN(getsockopt), + FN(getsockopt), \ + FN(override_return), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/src/cc/compat/linux/virtual_bpf.h b/src/cc/compat/linux/virtual_bpf.h index a2bcf07..7fbc365 100644 --- a/src/cc/compat/linux/virtual_bpf.h +++ b/src/cc/compat/linux/virtual_bpf.h @@ -735,7 +735,8 @@ union bpf_attr { FN(xdp_adjust_meta), \ FN(perf_event_read_value), \ FN(perf_prog_read_value), \ - FN(getsockopt), + FN(getsockopt), \ + FN(override_return), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/src/cc/export/helpers.h b/src/cc/export/helpers.h index 2b64ee8..571191e 100644 --- a/src/cc/export/helpers.h +++ b/src/cc/export/helpers.h @@ -204,6 +204,8 @@ static int (*bpf_probe_read)(void *dst, u64 size, const void *unsafe_ptr) = (void *) BPF_FUNC_probe_read; static u64 (*bpf_ktime_get_ns)(void) = (void *) BPF_FUNC_ktime_get_ns; +static void (*bpf_override_return)(void *ctx, unsigned long rc) = + (void *) BPF_FUNC_override_return; static u32 (*bpf_get_prandom_u32)(void) = (void *) BPF_FUNC_get_prandom_u32; static int (*bpf_trace_printk_)(const char *fmt, u64 fmt_size, ...) = diff --git a/tools/xfs_rt.py b/tools/xfs_rt.py new file mode 100755 index 0000000..b44fa14 --- /dev/null +++ b/tools/xfs_rt.py @@ -0,0 +1,130 @@ +#!/usr/bin/python +# @lint-avoid-python-3-compatibility-imports +# +# xfs_rt Decide on file data block device placement via custom algorithm. +# Uses XFS hacks to inject... stuff. +# +# Copyright 2017 Oracle, Inc. +# Licensed under the Apache License, Version 2.0 (the "License") + +from __future__ import print_function +from bcc import BPF +import argparse +from time import sleep, strftime +import ctypes as ct + +# arguments +examples = """examples: + ./xfs_rt +""" +parser = argparse.ArgumentParser( + description="Custom placement of data file blocks on XFS", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=examples) +args = parser.parse_args() +debug = 0 + +# define BPF program +bpf_text = """ +#include +#include + +struct xfs_fsop_geom { + __u32 blocksize; /* filesystem (data) block size */ + __u32 rtextsize; /* realtime extent size */ + __u32 agblocks; /* fsblocks in an AG */ + __u32 agcount; /* number of allocation groups */ + __u32 logblocks; /* fsblocks in the log */ + __u32 sectsize; /* (data) sector size, bytes */ + __u32 inodesize; /* inode size in bytes */ + __u32 imaxpct; /* max allowed inode space(%) */ + __u64 datablocks; /* fsblocks in data subvolume */ + __u64 rtblocks; /* fsblocks in realtime subvol */ + __u64 rtextents; /* rt extents in realtime subvol*/ + __u64 logstart; /* starting fsblock of the log */ + unsigned char uuid[16]; /* unique id of the filesystem */ + __u32 sunit; /* stripe unit, fsblocks */ + __u32 swidth; /* stripe width, fsblocks */ + __s32 version; /* structure version */ + __u32 flags; /* superblock version flags */ + __u32 logsectsize; /* log sector size, bytes */ + __u32 rtsectsize; /* realtime sector size, bytes */ + __u32 dirblocksize; /* directory block size, bytes */ + __u32 logsunit; /* log stripe unit, bytes */ +}; + +/* Output for XFS_FS_COUNTS */ +struct xfs_fsop_counts { + __u64 freedata; /* free data section blocks */ + __u64 freertx; /* free rt extents */ + __u64 freeino; /* free inodes */ + __u64 allocino; /* total allocated inodes */ +}; + +typedef unsigned long long xfs_ino_t; + +int +xfs_hack_filter_iflags_begin( + struct pt_regs *ctx, + struct xfs_fsop_geom *geo, + struct xfs_fsop_counts *stats, + xfs_ino_t ino, + loff_t offset, + loff_t length, + uint xflags) +{ + bool use_rt = false; + +#if 0 + bpf_trace_printk("B: off=%llu len=%llu xflags=0x%x\\n", offset, length, xflags); + bpf_trace_printk("B: dblocks=%llu rblocks=%llu\\n", geo->datablocks, geo->rtblocks); + bpf_trace_printk("B: dfree=%llu rfree=%llu\\n", stats->freedata, stats->freertx); +#endif + + /* + * If the first allocation request is for >64k then we assume this + * is a "large" file and push it to the rt device. + */ + if (length >= 65536) + use_rt = true; + + /* + * Redirect files to the 'other' device if the chosen one is more + * than 80% full. + */ + if (use_rt && stats->freertx < geo->rtblocks / 5) + use_rt = false; + else if (!use_rt && stats->freedata < geo->datablocks / 5) + use_rt = true; + + if (use_rt) + xflags |= FS_XFLAG_REALTIME; + else + xflags &= ~FS_XFLAG_REALTIME; + + bpf_override_return(ctx, xflags); + return 0; +} + +""" +if debug: + print(bpf_text) + +# initialize BPF +b = BPF(text=bpf_text) + +# common file functions +b.attach_kprobe(event="xfs_hack_filter_iflags", fn_name="xfs_hack_filter_iflags_begin") + +print("BPF HACKING XFS... Hit Ctrl-C to end.") + +# output +exiting = 0 +while (1): + try: + sleep(99999999) + except KeyboardInterrupt: + exiting = 1 + + if exiting: + exit()