From patchwork Wed Dec 13 06:21:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 10109271 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id E67B660352 for ; Wed, 13 Dec 2017 06:27:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA90A28B0D for ; Wed, 13 Dec 2017 06:27:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CD71728B1C; Wed, 13 Dec 2017 06:27:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CDD8D28B0D for ; Wed, 13 Dec 2017 06:27:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751018AbdLMG1Z (ORCPT ); Wed, 13 Dec 2017 01:27:25 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:48618 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750751AbdLMG1Y (ORCPT ); Wed, 13 Dec 2017 01:27:24 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.21/8.16.0.21) with SMTP id vBD6QmHP042022; Wed, 13 Dec 2017 06:26:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=corp-2017-10-26; bh=5GiAAow4z6kXTkHiQmjeuFRlsXeLF/Bcys++kKzj12s=; b=mgxOCprwYrBoNSLF6mDjvG6tMFifn8ZUkgpFFkslUJvMTvxGEulp367skA9S5Metyt/7 MjubjuaDMjiVu7UXSyvsWg1xMWGuQjuV/tFkcz1WJ1wvUp00ud93Mm0VS/1athJ6H9+4 z42c3LrW5xSYhhmbkVXH83ujMsKBjRTNGj+aHUwSxVr2bB5rAKRAdAupocqFPx8xSqK6 spyf2/NWccko5NtIxjEq5pwePBHQUTXfa3Q/fHVSiIKx8wGFWpj+ui4gIZcvrghpG3Dz m363pcnnDfIL99xlQqdSaMl71AvXUGuTp2FoAqRn4T9YYOQvdIYhH5BR0kovUtoA0Cly IQ== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2etxutg0rf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Dec 2017 06:26:57 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id vBD6LuH6009032 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Dec 2017 06:21:56 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id vBD6Lu77007537; Wed, 13 Dec 2017 06:21:56 GMT Received: from localhost (/67.169.218.210) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 12 Dec 2017 22:21:55 -0800 Date: Tue, 12 Dec 2017 22:21:46 -0800 From: "Darrick J. Wong" To: linux-xfs@vger.kernel.org Cc: Richard Wareing , david@fromorbit.com, hch@infradead.org Subject: [PATCH 1/2] xfs: eBPF user hacks insanity Message-ID: <20171213062146.GP19219@magnolia> References: <20171213061825.GO19219@magnolia> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20171213061825.GO19219@magnolia> User-Agent: Mutt/1.5.24 (2015-08-30) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8743 signatures=668646 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712130093 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Darrick J. Wong Create some special filter functions to which userspace can attach eBPF programs which override the return value and thereby allow userspace to assist XFS in making contextualized decisions about where to put files. In other words, users can upload their own custom algorithms into XFS to override the default rtdev/datadev placement code. Signed-off-by: Darrick J. Wong --- fs/xfs/Kconfig | 9 ++ fs/xfs/Makefile | 2 + fs/xfs/xfs_bmap_util.c | 5 + fs/xfs/xfs_hacks.c | 159 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_hacks.h | 29 ++++++++ fs/xfs/xfs_iomap.c | 7 ++ kernel/trace/trace_kprobe.c | 2 + 7 files changed, 213 insertions(+) create mode 100644 fs/xfs/xfs_hacks.c create mode 100644 fs/xfs/xfs_hacks.h -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index 06be67d..1594822 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -143,3 +143,12 @@ config XFS_ASSERT_FATAL result in warnings. This behavior can be modified at runtime via sysfs. + +config XFS_HACKS + bool "XFS Userspace eBPF Hacks" + default n + depends on XFS_FS && BPF_KPROBE_OVERRIDE + help + Allow userspace to attach eBPF programs to various parts of XFS + in order to customize its decisions. This is insane; you get + to keep the pieces! diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 9f4de14..03e2a37 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -176,3 +176,5 @@ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ ) xfs-$(CONFIG_XFS_QUOTA) += scrub/quota.o endif + +xfs-$(CONFIG_XFS_HACKS) += xfs_hacks.o diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 6d37ab4..39ce418 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -45,6 +45,7 @@ #include "xfs_iomap.h" #include "xfs_reflink.h" #include "xfs_refcount.h" +#include "xfs_hacks.h" /* Kernel only BMAP related definitions and functions */ @@ -918,6 +919,10 @@ xfs_alloc_file_space( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; + error = xfs_hacks_retarget_iflags(ip, offset, len); + if (error) + return error; + error = xfs_qm_dqattach(ip, 0); if (error) return error; diff --git a/fs/xfs/xfs_hacks.c b/fs/xfs/xfs_hacks.c new file mode 100644 index 0000000..28d2852 --- /dev/null +++ b/fs/xfs/xfs_hacks.c @@ -0,0 +1,159 @@ +/* + * Copyright (C) 2017 Oracle. All Rights Reserved. + * + * Author: Darrick J. Wong + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_itable.h" +#include "xfs_fsops.h" +#include + +static void +xfs_hacks_warn( + struct xfs_mount *mp) +{ + static struct ratelimit_state hack_warning = RATELIMIT_STATE_INIT( + __func__, 86400 * HZ, 1); + + ratelimit_set_flags(&hack_warning, RATELIMIT_MSG_ON_RELEASE); + if (__ratelimit(&hack_warning)) + xfs_alert(mp, +"WARNING userspace eBPF hack feature in use. Use at your own risk!"); +} + +/* + * Return current xflags unless someone attaches an eBPF program to + * override the default return value to feed the inode different xflags. + * This is the mechanism through which userspace can make more + * contextual decisions about where to put a file. + * + * ftrace cannot attach to this function if it is too short, so we have + * three throwaway calls to trace_printk to ensure that we have enough + * bytes... or something. + */ +uint +xfs_hack_filter_iflags( + struct xfs_fsop_geom *geo, + struct xfs_fsop_counts *stats, + xfs_ino_t ino, + loff_t offset, + loff_t length, + uint xflags) +{ + trace_printk("C: off=%llu len=%llu xflags=0x%x\\n", + offset, length, xflags); + trace_printk("C: dblocks=%llu rblocks=%llu\\n", + geo->datablocks, geo->rtblocks); + trace_printk("C: dfree=%llu rfree=%llu\\n", + stats->freedata, stats->freertx); + + return xflags; +} +BPF_ALLOW_ERROR_INJECTION(xfs_hack_filter_iflags); + +/* + * Change flags on empty files, if so desired. + */ +#define XFS_XFLAGS_CAN_RETARGET (FS_XFLAG_REALTIME) +int +xfs_hacks_retarget_iflags( + struct xfs_inode *ip, + loff_t offset, + loff_t length) +{ + struct xfs_fsop_geom fsgeo; + struct xfs_fsop_counts stats; + struct xfs_trans *tp; + struct xfs_mount *mp = ip->i_mount; + uint16_t flags; + uint64_t flags2; + uint curr_xflags; + uint new_xflags; + int error = 0; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp); + if (error) + return error; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + + flags = ip->i_d.di_flags; + flags2 = ip->i_d.di_flags2; + + /* Only allow retargeting of empty files. */ + if (i_size_read(VFS_I(ip)) || ip->i_d.di_nextents || ip->i_d.di_size) + goto out_unlock; + + error = xfs_fs_geometry(mp, &fsgeo, 4); + if (error) + goto out_unlock; + error = xfs_fs_counts(mp, &stats); + if (error) + goto out_unlock; + + curr_xflags = xfs_ip2xflags(ip); + new_xflags = xfs_hack_filter_iflags(&fsgeo, &stats, ip->i_ino, offset, + length, curr_xflags); + + if (new_xflags == curr_xflags) + goto out_unlock; + + xfs_hacks_warn(mp); + + error = -EINVAL; + if ((new_xflags ^ curr_xflags) & ~XFS_XFLAGS_CAN_RETARGET) + goto out_unlock; + + /* Change the rt flag. */ + if (new_xflags & FS_XFLAG_REALTIME) { + if (!mp->m_rtdev_targp) + goto out_unlock; + + if (xfs_is_reflink_inode(ip)) + flags2 &= ~XFS_DIFLAG2_REFLINK; + + flags |= XFS_DIFLAG_REALTIME; + } else { + flags &= ~XFS_DIFLAG_REALTIME; + } + + /* Log inode and get out. */ + ip->i_d.di_flags = flags; + ip->i_d.di_flags2 = flags2; + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + return xfs_trans_commit(tp); + +out_unlock: + xfs_iunlock(ip, XFS_ILOCK_EXCL); + xfs_trans_cancel(tp); + return error; +} diff --git a/fs/xfs/xfs_hacks.h b/fs/xfs/xfs_hacks.h new file mode 100644 index 0000000..2c556f1 --- /dev/null +++ b/fs/xfs/xfs_hacks.h @@ -0,0 +1,29 @@ +/* + * Copyright (C) 2017 Oracle. All Rights Reserved. + * + * Author: Darrick J. Wong + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it would be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write the Free Software Foundation, + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. + */ +#ifndef __XFS_HACKS_H__ +#define __XFS_HACKS_H__ + +#ifdef CONFIG_XFS_HACKS +int xfs_hacks_retarget_iflags(struct xfs_inode *ip, loff_t offset, loff_t length); +#else +# define xfs_hacks_retarget_iflags(ip, off, len) (0) +#endif + +#endif /* __XFS_HACKS_H__ */ diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 7ab52a8..f69d274 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -42,6 +42,7 @@ #include "xfs_dquot_item.h" #include "xfs_dquot.h" #include "xfs_reflink.h" +#include "xfs_hacks.h" #define XFS_WRITEIO_ALIGN(mp,off) (((off) >> mp->m_writeio_log) \ @@ -987,6 +988,12 @@ xfs_file_iomap_begin( if (XFS_FORCED_SHUTDOWN(mp)) return -EIO; + if (flags & IOMAP_WRITE) { + error = xfs_hacks_retarget_iflags(ip, offset, length); + if (error) + return error; + } + if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) && !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 5db8498..fd948e3 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -1215,8 +1215,10 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs) if (__this_cpu_read(bpf_kprobe_override)) { __this_cpu_write(bpf_kprobe_override, 0); reset_current_kprobe(); + preempt_enable(); return 1; } + preempt_enable(); if (!ret) return 0; }