From patchwork Tue Dec 31 23:39:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924036 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3DB11B0414 for ; Tue, 31 Dec 2024 23:39:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688347; cv=none; b=ZuCnGqMzn8NE6V+SvcYx+hUtifBZxXF/ubCGNr8ay0dWOWICPXAJuo+yB/dLSuFfdukwahjQDShpDd3+U7obkheKVI3y6KwCduJl5mM0KSoypn0qlIHJUY2QHgWkUX7PmGxmOQOls19ZoHRXZOKr67Il7Yn3Y/SPnrAlIAEMn5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688347; c=relaxed/simple; bh=KfYyJzj7UDXvK1znlTYbQ5MZ94Z9lnYsOh+s1c95FtE=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=E+Jh1HBnn7mcQ1ZSmgkLUdfvOPk9ehYsEzWAFHQfuFd3jVpP0C3lWujnHRHDPzpWyVcGKkLdQIbY/D4VeebTYNZw/7fWPaSkMG8t5a9ic6EIs4O8bwJTPUICJwFV1hCXkVxz5OgC2A21jZ3FbmZWTRKX+YvKiXUr+7pfYJWNLic= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RNuFuQG+; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RNuFuQG+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 69996C4CED2; Tue, 31 Dec 2024 23:39:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688347; bh=KfYyJzj7UDXvK1znlTYbQ5MZ94Z9lnYsOh+s1c95FtE=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=RNuFuQG+o+a1287MDaYkQ6+m9uL5MjFnkE6DwOpPjg/bm+oERZy+QAUc3V/F6PlX5 sfoomBRv+lFvL6wE/Tk/7Jypit+ffQj+vZvszFUuaFh6n+325kpCEFwcNWUgQLYljN 6DK6v8DNVkodYV7PsSs37f0YhkqaDoZkQWIkvlxWeexeYeasiNe/YC66GUK3SCBcGv V+G/5llGijzSh6aG32Z+1LR6WJNSEUgZjBB5kl2mpN/UWPa92IVr5Z8DmLtlbSWqW1 ldSD0WKI2KF/mb3MdcwqeVTOlIQ02563rI8lL4q8rOZOkyV+kvDzkgW8z6qDeeSAAl vRkCEYzeSAd+g== Date: Tue, 31 Dec 2024 15:39:06 -0800 Subject: [PATCH 01/16] xfs: create debugfs uuid aliases From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754759.2704911.1231694481039541264.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create an alias for the debugfs dir so that we can find a filesystem by uuid. Unless it's mounted nouuid. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_super.c | 11 +++++++++++ 2 files changed, 12 insertions(+) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 87007d9de5d9d0..d73e76e36bfc10 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -292,6 +292,7 @@ typedef struct xfs_mount { struct delayed_work m_reclaim_work; /* background inode reclaim */ struct xfs_zone_info *m_zone_info; /* zone allocator information */ struct dentry *m_debugfs; /* debugfs parent */ + struct dentry *m_debugfs_uuid; /* debugfs symlink */ struct xfs_kobj m_kobj; struct xfs_kobj m_error_kobj; struct xfs_kobj m_error_meta_kobj; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 099c30339e8f9d..fd641853fe3595 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -780,6 +780,7 @@ xfs_mount_free( if (mp->m_ddev_targp) xfs_free_buftarg(mp->m_ddev_targp); + debugfs_remove(mp->m_debugfs_uuid); debugfs_remove(mp->m_debugfs); kfree(mp->m_rtname); kfree(mp->m_logname); @@ -1893,6 +1894,16 @@ xfs_fs_fill_super( goto out_unmount; } + if (xfs_debugfs && mp->m_debugfs && !xfs_has_nouuid(mp)) { + char name[UUID_STRING_LEN + 1]; + + snprintf(name, UUID_STRING_LEN + 1, "%pU", &mp->m_sb.sb_uuid); + mp->m_debugfs_uuid = debugfs_create_symlink(name, xfs_debugfs, + mp->m_super->s_id); + } else { + mp->m_debugfs_uuid = NULL; + } + return 0; out_filestream_unmount: From patchwork Tue Dec 31 23:39:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924037 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32C6513FD72 for ; Tue, 31 Dec 2024 23:39:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688363; cv=none; b=qQdCOg6Jjks7Jzvq0Y3EXIVFvfqkD9IlJUurX+Ln3RnYXMHsdCBpCEsXKhDgyG8fi6EXQHJqALknbzqwHCl4D49vhreeV2NfRwY+4VXg1A3hMIT8Tbcmaxm4cEpGMgvGuZvRUqXsTRAJhvGXkUWK+MRLUO2JIh5nVbO22wvxNjI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688363; c=relaxed/simple; bh=WlYngZLHaq9ORnwYvQ2FLpYVzsdtQ/yhF5T+2SeeJ8A=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=g1TGsSx4zP/NY8GfQ2XLQUoMQUsJhbIBtM1IPr3p/rPC2MlnfzDVGB3408LmViYoYdBinej35mVzVKyDJh81ly88aPKOJWOwh5fm4NLk9ZlXW2w0KInK9zj4QQVcwKr0DWORh82AcwDvvV1/FCuyC4GgkWHp0QO6oGu4g2gFD04= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hCmzg12E; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hCmzg12E" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0BF91C4CED2; Tue, 31 Dec 2024 23:39:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688363; bh=WlYngZLHaq9ORnwYvQ2FLpYVzsdtQ/yhF5T+2SeeJ8A=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=hCmzg12EWOqGMKUKS3z08X3kY6rs+6HYUH/dojPW7qs7cYuT0mk8N6exQ5KZWYZxz 0Xba9NK9owyGC/HgUBnLPI0owhVeu/wQOuERjxJy8ncJzK8yougtbV0x8ZsrYyCJ+v 7TeKDJCsDcVKDImCLmGw3L9/D3lawTS6l1Ssje9B3RppecCv1C2rRUWykknP41Uf0t vbSTTBozrcTGVHujSSNz4VZwD+iMDzjWNnITVA1rEiXdS2MAUj1isnKXhoTQg9/IQ4 t0eV+wbgHnZY9ojh6JgmrL6hMc8B0i1oOrg2dheyLVV8BR+a9SI/fPZ4kl1nWiV2wb KiCqU+UDTUkQw== Date: Tue, 31 Dec 2024 15:39:22 -0800 Subject: [PATCH 02/16] xfs: create hooks for monitoring health updates From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754776.2704911.12756800959617456131.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create hooks for monitoring health events. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_health.h | 47 ++++++++++ fs/xfs/xfs_health.c | 202 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_mount.h | 3 + fs/xfs/xfs_super.c | 1 4 files changed, 252 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index b31000f7190ce5..39fef33dedc6a8 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -289,4 +289,51 @@ void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bulkstat *bs); #define xfs_metadata_is_sick(error) \ (unlikely((error) == -EFSCORRUPTED || (error) == -EFSBADCRC)) +/* + * Parameters for tracking health updates. The enum below is passed as the + * hook function argument. + */ +enum xfs_health_update_type { + XFS_HEALTHUP_SICK = 1, /* runtime corruption observed */ + XFS_HEALTHUP_CORRUPT, /* fsck reported corruption */ + XFS_HEALTHUP_HEALTHY, /* fsck reported healthy structure */ + XFS_HEALTHUP_UNMOUNT, /* filesystem is unmounting */ +}; + +/* Where in the filesystem was the event observed? */ +enum xfs_health_update_domain { + XFS_HEALTHUP_FS = 1, /* main filesystem */ + XFS_HEALTHUP_AG, /* allocation group */ + XFS_HEALTHUP_INODE, /* inode */ + XFS_HEALTHUP_RTGROUP, /* realtime group */ +}; + +struct xfs_health_update_params { + /* XFS_HEALTHUP_INODE */ + xfs_ino_t ino; + uint32_t gen; + + /* XFS_HEALTHUP_AG/RTGROUP */ + uint32_t group; + + /* XFS_SICK_* flags */ + unsigned int old_mask; + unsigned int new_mask; + + enum xfs_health_update_domain domain; +}; + +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_health_hook { + struct xfs_hook health_hook; +}; + +void xfs_health_hook_disable(void); +void xfs_health_hook_enable(void); + +int xfs_health_hook_add(struct xfs_mount *mp, struct xfs_health_hook *hook); +void xfs_health_hook_del(struct xfs_mount *mp, struct xfs_health_hook *hook); +void xfs_health_hook_setup(struct xfs_health_hook *hook, notifier_fn_t mod_fn); +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_HEALTH_H__ */ diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 7c541fb373d5b2..abf9460ae79953 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -20,6 +20,157 @@ #include "xfs_quota_defs.h" #include "xfs_rtgroup.h" +#ifdef CONFIG_XFS_LIVE_HOOKS +/* + * Use a static key here to reduce the overhead of health updates. If + * the compiler supports jump labels, the static branch will be replaced by a + * nop sled when there are no hook users. Online fsck is currently the only + * caller, so this is a reasonable tradeoff. + * + * Note: Patching the kernel code requires taking the cpu hotplug lock. Other + * parts of the kernel allocate memory with that lock held, which means that + * XFS callers cannot hold any locks that might be used by memory reclaim or + * writeback when calling the static_branch_{inc,dec} functions. + */ +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_health_hooks_switch); + +void +xfs_health_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_health_hooks_switch); +} + +void +xfs_health_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_health_hooks_switch); +} + +/* Call downstream hooks for a filesystem unmount health update. */ +static inline void +xfs_health_unmount_hook( + struct xfs_mount *mp) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .domain = XFS_HEALTHUP_FS, + }; + + xfs_hooks_call(&mp->m_health_update_hooks, + XFS_HEALTHUP_UNMOUNT, &p); + } +} + +/* Call downstream hooks for a filesystem health update. */ +static inline void +xfs_fs_health_update_hook( + struct xfs_mount *mp, + enum xfs_health_update_type op, + unsigned int old_mask, + unsigned int new_mask) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .domain = XFS_HEALTHUP_FS, + .old_mask = old_mask, + .new_mask = new_mask, + }; + + if (new_mask) + xfs_hooks_call(&mp->m_health_update_hooks, op, &p); + } +} + +/* Call downstream hooks for a group health update. */ +static inline void +xfs_group_health_update_hook( + struct xfs_group *xg, + enum xfs_health_update_type op, + unsigned int old_mask, + unsigned int new_mask) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .old_mask = old_mask, + .new_mask = new_mask, + .group = xg->xg_gno, + }; + struct xfs_mount *mp = xg->xg_mount; + + switch (xg->xg_type) { + case XG_TYPE_AG: + p.domain = XFS_HEALTHUP_AG; + break; + case XG_TYPE_RTG: + p.domain = XFS_HEALTHUP_RTGROUP; + break; + default: + ASSERT(0); + return; + } + + if (new_mask) + xfs_hooks_call(&mp->m_health_update_hooks, op, &p); + } +} + +/* Call downstream hooks for an inode health update. */ +static inline void +xfs_inode_health_update_hook( + struct xfs_inode *ip, + enum xfs_health_update_type op, + unsigned int old_mask, + unsigned int new_mask) +{ + if (xfs_hooks_switched_on(&xfs_health_hooks_switch)) { + struct xfs_health_update_params p = { + .domain = XFS_HEALTHUP_INODE, + .old_mask = old_mask, + .new_mask = new_mask, + .ino = ip->i_ino, + .gen = VFS_I(ip)->i_generation, + }; + struct xfs_mount *mp = ip->i_mount; + + if (new_mask) + xfs_hooks_call(&mp->m_health_update_hooks, op, &p); + } +} + +/* Call the specified function during a health update. */ +int +xfs_health_hook_add( + struct xfs_mount *mp, + struct xfs_health_hook *hook) +{ + return xfs_hooks_add(&mp->m_health_update_hooks, &hook->health_hook); +} + +/* Stop calling the specified function during a health update. */ +void +xfs_health_hook_del( + struct xfs_mount *mp, + struct xfs_health_hook *hook) +{ + xfs_hooks_del(&mp->m_health_update_hooks, &hook->health_hook); +} + +/* Configure health update hook functions. */ +void +xfs_health_hook_setup( + struct xfs_health_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->health_hook, mod_fn); +} +#else +# define xfs_health_unmount_hook(...) ((void)0) +# define xfs_fs_health_update_hook(a,b,o,n) do {o = o;} while(0) +# define xfs_rt_health_update_hook(a,b,o,n) do {o = o;} while(0) +# define xfs_group_health_update_hook(a,b,o,n) do {o = o;} while(0) +# define xfs_inode_health_update_hook(a,b,o,n) do {o = o;} while(0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + static void xfs_health_unmount_group( struct xfs_group *xg, @@ -50,8 +201,10 @@ xfs_health_unmount( unsigned int checked = 0; bool warn = false; - if (xfs_is_shutdown(mp)) + if (xfs_is_shutdown(mp)) { + xfs_health_unmount_hook(mp); return; + } /* Measure AG corruption levels. */ while ((pag = xfs_perag_next(mp, pag))) @@ -97,6 +250,8 @@ xfs_health_unmount( if (sick & XFS_SICK_FS_COUNTERS) xfs_fs_mark_healthy(mp, XFS_SICK_FS_COUNTERS); } + + xfs_health_unmount_hook(mp); } /* Mark unhealthy per-fs metadata. */ @@ -105,12 +260,17 @@ xfs_fs_mark_sick( struct xfs_mount *mp, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_FS_ALL)); trace_xfs_fs_mark_sick(mp, mask); spin_lock(&mp->m_sb_lock); + old_mask = mp->m_fs_sick; mp->m_fs_sick |= mask; spin_unlock(&mp->m_sb_lock); + + xfs_fs_health_update_hook(mp, XFS_HEALTHUP_SICK, old_mask, mask); } /* Mark per-fs metadata as having been checked and found unhealthy by fsck. */ @@ -119,13 +279,18 @@ xfs_fs_mark_corrupt( struct xfs_mount *mp, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_FS_ALL)); trace_xfs_fs_mark_corrupt(mp, mask); spin_lock(&mp->m_sb_lock); + old_mask = mp->m_fs_sick; mp->m_fs_sick |= mask; mp->m_fs_checked |= mask; spin_unlock(&mp->m_sb_lock); + + xfs_fs_health_update_hook(mp, XFS_HEALTHUP_CORRUPT, old_mask, mask); } /* Mark a per-fs metadata healed. */ @@ -134,15 +299,20 @@ xfs_fs_mark_healthy( struct xfs_mount *mp, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_FS_ALL)); trace_xfs_fs_mark_healthy(mp, mask); spin_lock(&mp->m_sb_lock); + old_mask = mp->m_fs_sick; mp->m_fs_sick &= ~mask; if (!(mp->m_fs_sick & XFS_SICK_FS_PRIMARY)) mp->m_fs_sick &= ~XFS_SICK_FS_SECONDARY; mp->m_fs_checked |= mask; spin_unlock(&mp->m_sb_lock); + + xfs_fs_health_update_hook(mp, XFS_HEALTHUP_HEALTHY, old_mask, mask); } /* Sample which per-fs metadata are unhealthy. */ @@ -192,12 +362,17 @@ xfs_group_mark_sick( struct xfs_group *xg, unsigned int mask) { + unsigned int old_mask; + xfs_group_check_mask(xg, mask); trace_xfs_group_mark_sick(xg, mask); spin_lock(&xg->xg_state_lock); + old_mask = xg->xg_sick; xg->xg_sick |= mask; spin_unlock(&xg->xg_state_lock); + + xfs_group_health_update_hook(xg, XFS_HEALTHUP_SICK, old_mask, mask); } /* @@ -208,13 +383,18 @@ xfs_group_mark_corrupt( struct xfs_group *xg, unsigned int mask) { + unsigned int old_mask; + xfs_group_check_mask(xg, mask); trace_xfs_group_mark_corrupt(xg, mask); spin_lock(&xg->xg_state_lock); + old_mask = xg->xg_sick; xg->xg_sick |= mask; xg->xg_checked |= mask; spin_unlock(&xg->xg_state_lock); + + xfs_group_health_update_hook(xg, XFS_HEALTHUP_CORRUPT, old_mask, mask); } /* @@ -225,15 +405,20 @@ xfs_group_mark_healthy( struct xfs_group *xg, unsigned int mask) { + unsigned int old_mask; + xfs_group_check_mask(xg, mask); trace_xfs_group_mark_healthy(xg, mask); spin_lock(&xg->xg_state_lock); + old_mask = xg->xg_sick; xg->xg_sick &= ~mask; if (!(xg->xg_sick & XFS_SICK_AG_PRIMARY)) xg->xg_sick &= ~XFS_SICK_AG_SECONDARY; xg->xg_checked |= mask; spin_unlock(&xg->xg_state_lock); + + xfs_group_health_update_hook(xg, XFS_HEALTHUP_HEALTHY, old_mask, mask); } /* Sample which per-ag metadata are unhealthy. */ @@ -272,10 +457,13 @@ xfs_inode_mark_sick( struct xfs_inode *ip, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_INO_ALL)); trace_xfs_inode_mark_sick(ip, mask); spin_lock(&ip->i_flags_lock); + old_mask = ip->i_sick; ip->i_sick |= mask; spin_unlock(&ip->i_flags_lock); @@ -287,6 +475,8 @@ xfs_inode_mark_sick( spin_lock(&VFS_I(ip)->i_lock); VFS_I(ip)->i_state &= ~I_DONTCACHE; spin_unlock(&VFS_I(ip)->i_lock); + + xfs_inode_health_update_hook(ip, XFS_HEALTHUP_SICK, old_mask, mask); } /* Mark inode metadata as having been checked and found unhealthy by fsck. */ @@ -295,10 +485,13 @@ xfs_inode_mark_corrupt( struct xfs_inode *ip, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_INO_ALL)); trace_xfs_inode_mark_corrupt(ip, mask); spin_lock(&ip->i_flags_lock); + old_mask = ip->i_sick; ip->i_sick |= mask; ip->i_checked |= mask; spin_unlock(&ip->i_flags_lock); @@ -311,6 +504,8 @@ xfs_inode_mark_corrupt( spin_lock(&VFS_I(ip)->i_lock); VFS_I(ip)->i_state &= ~I_DONTCACHE; spin_unlock(&VFS_I(ip)->i_lock); + + xfs_inode_health_update_hook(ip, XFS_HEALTHUP_CORRUPT, old_mask, mask); } /* Mark parts of an inode healed. */ @@ -319,15 +514,20 @@ xfs_inode_mark_healthy( struct xfs_inode *ip, unsigned int mask) { + unsigned int old_mask; + ASSERT(!(mask & ~XFS_SICK_INO_ALL)); trace_xfs_inode_mark_healthy(ip, mask); spin_lock(&ip->i_flags_lock); + old_mask = ip->i_sick; ip->i_sick &= ~mask; if (!(ip->i_sick & XFS_SICK_INO_PRIMARY)) ip->i_sick &= ~XFS_SICK_INO_SECONDARY; ip->i_checked |= mask; spin_unlock(&ip->i_flags_lock); + + xfs_inode_health_update_hook(ip, XFS_HEALTHUP_HEALTHY, old_mask, mask); } /* Sample which parts of an inode are unhealthy. */ diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index d73e76e36bfc10..df5e4a48af72b7 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -340,6 +340,9 @@ typedef struct xfs_mount { /* Hook to feed dirent updates to an active online repair. */ struct xfs_hooks m_dir_update_hooks; + + /* Hook to feed health events to a daemon. */ + struct xfs_hooks m_health_update_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index fd641853fe3595..e4789dfe1a369e 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2182,6 +2182,7 @@ xfs_init_fs_context( mp->m_allocsize_log = 16; /* 64k */ xfs_hooks_init(&mp->m_dir_update_hooks); + xfs_hooks_init(&mp->m_health_update_hooks); fc->s_fs_info = mp; fc->ops = &xfs_context_ops; From patchwork Tue Dec 31 23:39:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924038 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D497B1B0418 for ; Tue, 31 Dec 2024 23:39:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688378; cv=none; b=HQdivox6ESSR0GsOhtHgbB/60vZ2fnHmxcfSnAUJD78N2smYiY9Q9FbDFOvA8N1u1ZO+09tTjHkbnocpMASAtBLb7Eb+Q85vGWM7RrfvsL7exq+MrQuAfYkqjSmNNo9ZUKME+2tnDIH3x8IE9nxXqPXeRBZV6gvT3w4oDaj+B1k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688378; c=relaxed/simple; bh=PoBuSyub+8zLJpF8DRtWEE/GFVI3KxKjUyEoqjLLl0Y=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IoYbVhnMYQTHalYQGqagahu2UIOdrcYsVlIx/rRq6hQUWcOBA5kcXkIKBxUBsi3UgMxAmUIxDlXkSmqlLhtqv743R4A4xzY/hdi7AikoV7Ocrx8oi1U+FSvKXJNW4re78eAZrTx4NeVPBAXpEBvBXGddJCB+J0XEVRBeUQBlk8Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=cTCBCmHa; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="cTCBCmHa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B2081C4CED2; Tue, 31 Dec 2024 23:39:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688378; bh=PoBuSyub+8zLJpF8DRtWEE/GFVI3KxKjUyEoqjLLl0Y=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=cTCBCmHaeoEdiu7rExLDnFHmkRx/knL1HpBjXOXnoWk8OIzjzDRn6kzdONs7se56g r7gMfUARpMZ+27cgikUduLQaNzYrUGuJzXb7VZpOwmJLnr9jGg+pP2dtHAHlUNd2oR 7weAAnE52D1IVbP61t8z6ll/joyWE0cga34arPQtrv9ED8xC8NidZrgmuCiws3/Kbk 5JnXIbyUbbhagsxiweRiqA7LJIIJXenco59jquIDEbycZ0zfHYuUBEiUSvfzoxEI2/ 8y3JvFnrh8nI0wUJ6e9m7GX1rlmgYR4Y68AwILb1D5SYuy0ct2yJ/9eLRk4sZFs19H rHtSpAjuPyVtQ== Date: Tue, 31 Dec 2024 15:39:38 -0800 Subject: [PATCH 03/16] xfs: create a filesystem shutdown hook From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754794.2704911.7718279790539710344.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create a hook so that health monitoring can report filesystem shutdown events to userspace. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_fsops.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_fsops.h | 14 +++++++++++++ fs/xfs/xfs_mount.h | 3 +++ fs/xfs/xfs_super.c | 1 + 4 files changed, 75 insertions(+) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 150979c8333530..439e76f38ed42e 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -480,6 +480,61 @@ xfs_fs_goingdown( return 0; } +#ifdef CONFIG_XFS_LIVE_HOOKS +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_shutdown_hooks_switch); + +void +xfs_shutdown_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_shutdown_hooks_switch); +} + +void +xfs_shutdown_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_shutdown_hooks_switch); +} + +/* Call downstream hooks for a filesystem shutdown. */ +static inline void +xfs_shutdown_hook( + struct xfs_mount *mp, + uint32_t flags) +{ + if (xfs_hooks_switched_on(&xfs_shutdown_hooks_switch)) + xfs_hooks_call(&mp->m_shutdown_hooks, flags, NULL); +} + +/* Call the specified function during a shutdown update. */ +int +xfs_shutdown_hook_add( + struct xfs_mount *mp, + struct xfs_shutdown_hook *hook) +{ + return xfs_hooks_add(&mp->m_shutdown_hooks, &hook->shutdown_hook); +} + +/* Stop calling the specified function during a shutdown update. */ +void +xfs_shutdown_hook_del( + struct xfs_mount *mp, + struct xfs_shutdown_hook *hook) +{ + xfs_hooks_del(&mp->m_shutdown_hooks, &hook->shutdown_hook); +} + +/* Configure shutdown update hook functions. */ +void +xfs_shutdown_hook_setup( + struct xfs_shutdown_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->shutdown_hook, mod_fn); +} +#else +# define xfs_shutdown_hook(...) ((void)0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + /* * Force a shutdown of the filesystem instantly while keeping the filesystem * consistent. We don't do an unmount here; just shutdown the shop, make sure @@ -538,6 +593,8 @@ xfs_do_force_shutdown( "Please unmount the filesystem and rectify the problem(s)"); if (xfs_error_level >= XFS_ERRLEVEL_HIGH) xfs_stack_trace(); + + xfs_shutdown_hook(mp, flags); } /* diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h index 9d23c361ef56e4..7f6f876de072b1 100644 --- a/fs/xfs/xfs_fsops.h +++ b/fs/xfs/xfs_fsops.h @@ -15,4 +15,18 @@ int xfs_fs_goingdown(struct xfs_mount *mp, uint32_t inflags); int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp); void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp); +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_shutdown_hook { + struct xfs_hook shutdown_hook; +}; + +void xfs_shutdown_hook_disable(void); +void xfs_shutdown_hook_enable(void); + +int xfs_shutdown_hook_add(struct xfs_mount *mp, struct xfs_shutdown_hook *hook); +void xfs_shutdown_hook_del(struct xfs_mount *mp, struct xfs_shutdown_hook *hook); +void xfs_shutdown_hook_setup(struct xfs_shutdown_hook *hook, + notifier_fn_t mod_fn); +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_FSOPS_H__ */ diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index df5e4a48af72b7..a8c81c4ccb2000 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -343,6 +343,9 @@ typedef struct xfs_mount { /* Hook to feed health events to a daemon. */ struct xfs_hooks m_health_update_hooks; + + /* Hook to feed shutdown events to a daemon. */ + struct xfs_hooks m_shutdown_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e4789dfe1a369e..71aa97a5d1dcaa 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2182,6 +2182,7 @@ xfs_init_fs_context( mp->m_allocsize_log = 16; /* 64k */ xfs_hooks_init(&mp->m_dir_update_hooks); + xfs_hooks_init(&mp->m_shutdown_hooks); xfs_hooks_init(&mp->m_health_update_hooks); fc->s_fs_info = mp; From patchwork Tue Dec 31 23:39:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924039 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79AD813FD72 for ; Tue, 31 Dec 2024 23:39:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688394; cv=none; b=i7JRZvO0JyZeqMEVlZUj+MUt0yfZgyrVDdcuhnW/sSsYr2NFXmCk2IQfLDWlqTiRFtGSKbF5MlGojUEdT85ZC1H2k/dIjB10Sj5yvtJg/qeyDh+L9M7NbR24T6YiYBFy4U+00ozMJVnMUmFAvtFvbCxHOTotPdWCaHWbzBT009Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688394; c=relaxed/simple; bh=Mu/xVmYiOZu6FP9uxHXeUZ3vNuoUv5PvE5WxF9nzIEQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=b5BgMZUwmcVz26CqJRx/C4XBRl8t/Fnwnehlu0clD0eX0X8rna9Wd+p3FNg5MqmT+BReiszNrr0LtAE70djVB02Fzb6YUVYxP2UuZlybJb3uA9eqQTfblDSMDrWM0e6stcGCWeQA9hS9S+RIccfGqcuqQHvPFsTiBxs2lxZCKVg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WW76uK3o; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WW76uK3o" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51D26C4CED2; Tue, 31 Dec 2024 23:39:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688394; bh=Mu/xVmYiOZu6FP9uxHXeUZ3vNuoUv5PvE5WxF9nzIEQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=WW76uK3oYwwFNxgvUjow1ylEluZZRRO55zfIBFxr1WeIZgzNNSW2RLtifc5PMlHFd lVM4VpxhC3BRXtKxtaOpWjqo8gzP6Sn2iBugM5780f5c4HlijSfH/c/EuI3/1b4+Zm tRzISgJqNUuK2rn4WqWfoFbb2GpLbTIbk48f1GiuYuYVMfmg3LvO0SiZ0bg+Bh+NOQ XC2EB/lkb5I4d/BXhXsGx/jSOCxaMufyzLgycObSD8A3lcYbb3Y5jtAEXQLHbLDB7e mT3xBDn4HxUUm+pSd3om67MVAZC8UFQKl0REXukXJWks3B/ONPjFun7+riOTjbAmvr xR55ukbfYMGcQ== Date: Tue, 31 Dec 2024 15:39:53 -0800 Subject: [PATCH 04/16] xfs: create hooks for media errors From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754811.2704911.2065068508446949767.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Set up a media error event hook so that we can send events to userspace. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_mount.h | 3 ++ fs/xfs/xfs_notify_failure.c | 86 ++++++++++++++++++++++++++++++++++++++++--- fs/xfs/xfs_notify_failure.h | 38 +++++++++++++++++++ fs/xfs/xfs_super.c | 1 + 4 files changed, 122 insertions(+), 6 deletions(-) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index a8c81c4ccb2000..3fcfdaaf199315 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -346,6 +346,9 @@ typedef struct xfs_mount { /* Hook to feed shutdown events to a daemon. */ struct xfs_hooks m_shutdown_hooks; + + /* Hook to feed media error events to a daemon. */ + struct xfs_hooks m_media_error_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index ed8d8ed42f0a2c..ea68c7e61bb585 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -27,6 +27,73 @@ #include #include +#ifdef CONFIG_XFS_LIVE_HOOKS +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_media_error_hooks_switch); + +void +xfs_media_error_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_media_error_hooks_switch); +} + +void +xfs_media_error_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_media_error_hooks_switch); +} + +/* Call downstream hooks for a media error. */ +static inline void +xfs_media_error_hook( + struct xfs_mount *mp, + enum xfs_failed_device fdev, + xfs_daddr_t daddr, + uint64_t bbcount, + bool pre_remove) +{ + if (xfs_hooks_switched_on(&xfs_media_error_hooks_switch)) { + struct xfs_media_error_params p = { + .mp = mp, + .fdev = fdev, + .daddr = daddr, + .bbcount = bbcount, + .pre_remove = pre_remove, + }; + + xfs_hooks_call(&mp->m_media_error_hooks, 0, &p); + } +} + +/* Call the specified function during a media error. */ +int +xfs_media_error_hook_add( + struct xfs_mount *mp, + struct xfs_media_error_hook *hook) +{ + return xfs_hooks_add(&mp->m_media_error_hooks, &hook->error_hook); +} + +/* Stop calling the specified function during a media error. */ +void +xfs_media_error_hook_del( + struct xfs_mount *mp, + struct xfs_media_error_hook *hook) +{ + xfs_hooks_del(&mp->m_media_error_hooks, &hook->error_hook); +} + +/* Configure media error hook functions. */ +void +xfs_media_error_hook_setup( + struct xfs_media_error_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->error_hook, mod_fn); +} +#else +# define xfs_media_error_hook(...) ((void)0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + struct xfs_failure_info { xfs_agblock_t startblock; xfs_extlen_t blockcount; @@ -215,6 +282,9 @@ xfs_dax_notify_logdev_failure( if (error) return error; + xfs_media_error_hook(mp, XFS_FAILED_LOGDEV, daddr, bblen, + mf_flags & MF_MEM_PRE_REMOVE); + /* * In the pre-remove case the failure notification is attempting to * trigger a force unmount. The expectation is that the device is @@ -248,17 +318,21 @@ xfs_dax_notify_dev_failure( uint64_t bblen; struct xfs_group *xg = NULL; + error = xfs_dax_translate_range(type == XG_TYPE_RTG ? + mp->m_rtdev_targp : mp->m_ddev_targp, + offset, len, &daddr, &bblen); + if (error) + return error; + + xfs_media_error_hook(mp, type == XG_TYPE_RTG ? + XFS_FAILED_RTDEV : XFS_FAILED_DATADEV, + daddr, bblen, mf_flags & MF_MEM_PRE_REMOVE); + if (!xfs_has_rmapbt(mp)) { xfs_debug(mp, "notify_failure() needs rmapbt enabled!"); return -EOPNOTSUPP; } - error = xfs_dax_translate_range(type == XG_TYPE_RTG ? - mp->m_rtdev_targp : mp->m_ddev_targp, - offset, len, &daddr, &bblen); - if (error) - return error; - if (type == XG_TYPE_RTG) { start_bno = xfs_daddr_to_rtb(mp, daddr); end_bno = xfs_daddr_to_rtb(mp, daddr + bblen - 1); diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h index 41108044d35d47..835d4af504d832 100644 --- a/fs/xfs/xfs_notify_failure.h +++ b/fs/xfs/xfs_notify_failure.h @@ -8,4 +8,42 @@ extern const struct dax_holder_operations xfs_dax_holder_operations; +enum xfs_failed_device { + XFS_FAILED_DATADEV, + XFS_FAILED_LOGDEV, + XFS_FAILED_RTDEV, +}; + +#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +struct xfs_media_error_params { + struct xfs_mount *mp; + enum xfs_failed_device fdev; + xfs_daddr_t daddr; + uint64_t bbcount; + bool pre_remove; +}; + +struct xfs_media_error_hook { + struct xfs_hook error_hook; +}; + +void xfs_media_error_hook_disable(void); +void xfs_media_error_hook_enable(void); + +int xfs_media_error_hook_add(struct xfs_mount *mp, + struct xfs_media_error_hook *hook); +void xfs_media_error_hook_del(struct xfs_mount *mp, + struct xfs_media_error_hook *hook); +void xfs_media_error_hook_setup(struct xfs_media_error_hook *hook, + notifier_fn_t mod_fn); +#else +struct xfs_media_error_params { }; +struct xfs_media_error_hook { }; +# define xfs_media_error_hook_disable() ((void)0) +# define xfs_media_error_hook_enable() ((void)0) +# define xfs_media_error_hook_add(...) (0) +# define xfs_media_error_hook_del(...) ((void)0) +# define xfs_media_error_hook_setup(...) ((void)0) +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_NOTIFY_FAILURE_H__ */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 71aa97a5d1dcaa..a49082159faae8 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2184,6 +2184,7 @@ xfs_init_fs_context( xfs_hooks_init(&mp->m_dir_update_hooks); xfs_hooks_init(&mp->m_shutdown_hooks); xfs_hooks_init(&mp->m_health_update_hooks); + xfs_hooks_init(&mp->m_media_error_hooks); fc->s_fs_info = mp; fc->ops = &xfs_context_ops; From patchwork Tue Dec 31 23:40:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924040 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2949613FD72 for ; Tue, 31 Dec 2024 23:40:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688410; cv=none; b=vDnThFQin2vvx2hli+guA1zFzW9Pun+Q9iFkCfUfH+yZRBSd4tAPRcFvkDl0zGjO3iFhdVzWZ3ADYMkYgZVTpCy4ZF+phTW6oBqBsFpVwcWB7LkikXGFJBCJ9boIWrwrEkS8RU+Nzlpw/eOL7BdcsqxgzFzIHfOUEJkdaXyNmp4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688410; c=relaxed/simple; bh=I6r4whCp2cVDkTqRdsyiCBt0h8QCjktMKGrZQ/Y5qGY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gMtoW6Qs5XVVzvMEUhKOWK35XeoeLuXUDSjAklH4KfJP/q0itO49kyc2yfMJpY/xyzGhH/f4rtLQ0RS8twfS77Wk4RDESESQREutgUC3aj3PZd8SnooEKacBiTkzs/j5XRO0/RvisRi/8Ti4kp0aC4fdX4EmQ2lfOvTKGx8vdEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=D2DR0qGD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="D2DR0qGD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EE745C4CED2; Tue, 31 Dec 2024 23:40:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688410; bh=I6r4whCp2cVDkTqRdsyiCBt0h8QCjktMKGrZQ/Y5qGY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=D2DR0qGDdthHU8PwiwAKjCG0lqiZ5VSEhWOF1Hcrl35J0Xp59UWAMhgE7HU1ZfJVD evgRq8ooLq5399rWv4yu4Ohs5mm8yf/uoGpKYEbeEkUe02BSJ9XH325CS+U5Ikg6cG UsKz7yTi1KhwhphIyrCC1dC0XXO/vXkl9PzgCH2wlmfeKf+S+Nbt6ADrgX9nEnEIjG 3Apc8WbEmVFwwcFxKso2KnkYD1NO416Zf+JlWABjL5TlErNE4LRD/MLtJFajMMEA+G Y7A/KbnNu4J/i2T5aLBsPcmdufSDicadrdFP4OJYh8WTr4bP3RkYW0do65aue+yiIw DlIN1X1HAm0Og== Date: Tue, 31 Dec 2024 15:40:09 -0800 Subject: [PATCH 05/16] iomap, filemap: report buffered read and write io errors to the filesystem From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754829.2704911.5583911059846056720.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Provide a callback so that iomap can report read and write IO errors to the caller filesystem. For now this is only wired up for iomap as a testbed for XFS. Signed-off-by: "Darrick J. Wong" --- Documentation/filesystems/vfs.rst | 7 +++++++ fs/iomap/buffered-io.c | 26 +++++++++++++++++++++++++- include/linux/fs.h | 4 ++++ 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 0b18af3f954eb7..2f0ef4e1a8d340 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -827,6 +827,8 @@ cache in your filesystem. The following members are defined: int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span) int (*swap_deactivate)(struct file *); int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); + void (*ioerror)(struct address_space *mapping, int direction, + loff_t pos, u64 len, int error); }; ``writepage`` @@ -1056,6 +1058,11 @@ cache in your filesystem. The following members are defined: ``swap_rw`` Called to read or write swap pages when SWP_FS_OPS is set. +``ioerror`` + Called to deal with IO errors during readahead or writeback. + This may be called from interrupt context, and without any + locks necessarily being held. + The File Object =============== diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 86e30b56e8d41b..39782376895306 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -284,6 +284,14 @@ static void iomap_adjust_read_range(struct inode *inode, struct folio *folio, *lenp = plen; } +static inline void iomap_mapping_ioerror(struct address_space *mapping, + int direction, loff_t pos, u64 len, int error) +{ + if (mapping && mapping->a_ops->ioerror) + mapping->a_ops->ioerror(mapping, direction, pos, len, + error); +} + static void iomap_finish_folio_read(struct folio *folio, size_t off, size_t len, int error) { @@ -302,6 +310,10 @@ static void iomap_finish_folio_read(struct folio *folio, size_t off, spin_unlock_irqrestore(&ifs->state_lock, flags); } + if (error) + iomap_mapping_ioerror(folio->mapping, READ, + folio_pos(folio) + off, len, error); + if (finished) folio_end_read(folio, uptodate); } @@ -670,11 +682,16 @@ static int iomap_read_folio_sync(loff_t block_start, struct folio *folio, { struct bio_vec bvec; struct bio bio; + int ret; bio_init(&bio, iomap->bdev, &bvec, 1, REQ_OP_READ); bio.bi_iter.bi_sector = iomap_sector(iomap, block_start); bio_add_folio_nofail(&bio, folio, plen, poff); - return submit_bio_wait(&bio); + ret = submit_bio_wait(&bio); + if (ret) + iomap_mapping_ioerror(folio->mapping, READ, + folio_pos(folio) + poff, plen, ret); + return ret; } static int __iomap_write_begin(const struct iomap_iter *iter, loff_t pos, @@ -1573,6 +1590,11 @@ u32 iomap_finish_ioend_buffered(struct iomap_ioend *ioend) /* walk all folios in bio, ending page IO on them */ bio_for_each_folio_all(fi, bio) { + if (ioend->io_error) + iomap_mapping_ioerror(inode->i_mapping, WRITE, + folio_pos(fi.folio) + fi.offset, + fi.length, ioend->io_error); + iomap_finish_folio_write(inode, fi.folio, fi.length); folio_count++; } @@ -1881,6 +1903,8 @@ static int iomap_writepage_map(struct iomap_writepage_ctx *wpc, if (count) wpc->nr_folios++; + if (error && !count) + iomap_mapping_ioerror(inode->i_mapping, WRITE, pos, 0, error); /* * We can have dirty bits set past end of file in page_mkwrite path diff --git a/include/linux/fs.h b/include/linux/fs.h index b638fb1bcbc96f..9375753577025d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -438,6 +438,10 @@ struct address_space_operations { sector_t *span); void (*swap_deactivate)(struct file *file); int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); + + /* Callback for dealing with IO errors during readahead or writeback */ + void (*ioerror)(struct address_space *mapping, int direction, + loff_t pos, u64 len, int error); }; extern const struct address_space_operations empty_aops; From patchwork Tue Dec 31 23:40:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924041 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C266113FD72 for ; Tue, 31 Dec 2024 23:40:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688425; cv=none; b=kRudQKCmQxOH8a6BVtkEM5EbQ4khUj93vzHm6U0A57vgrmztRnrUIIOoisLMmrYvaLrNyn+bFy20Jl+1hk7fPfHuZ4D5IC1ottaMFFj3923qAldOiXuzdpWZw3uE3H7/LLLGL/0P3keBK/86kYT/RGl+Mr+ljI+zrv43jRucQfc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688425; c=relaxed/simple; bh=IkaQy7fXHErKiCRb4fOUOGyO90dXzFqLcE1mpWKo/K4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tlMLhEzNi6XthtejUMmTMWZXFFLUVoRPx7cbjg/wTFuYxa1smwd1tsQUDTc3cKC8EzBkg5zm3N9jOWU1KgauJar+SxvA1rEXLOhsOncHL2rzaRjm8cHWGTKX21t4CQLSSeSgB5jTLHHVGoilmI+aQ3XqTyXuXVs0E4gVof7fzIk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jKPX3af5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jKPX3af5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E7C2C4CED2; Tue, 31 Dec 2024 23:40:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688425; bh=IkaQy7fXHErKiCRb4fOUOGyO90dXzFqLcE1mpWKo/K4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=jKPX3af5C5ndUvN0YUCNRFC/jBHxlSccn2Jz/ExvzUy0FozNbqe7Kz7OKqHTrHQYY Pe4AoPTqQ/vSGDiF03JbK436mw+0wT7aXtfRAJyit3T4xjV29u6B3C0kX52AVemMVc z2y3xcAuixJIHwdQkturD+O68tE7vd7gyGKosf4OjmMBiOujdMhLRVNizoG8fGNcJq BfDLoVu8fe+Wkz70tLUYlVV99lSTO1gvtWRnGl1fd+n/ll4Pbk2TIVWG2/0Pt6s71R HBhYi/mkG5A578cY762EjmLuCBYrN5nOf879aPGkxglKuQKjUh6oetP0EIsEhfCai2 wMhZ6VS/nlh4A== Date: Tue, 31 Dec 2024 15:40:25 -0800 Subject: [PATCH 06/16] iomap: report directio read and write errors to callers From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754846.2704911.5576678697570752742.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add more hooks to report directio IO errors to the filesystem. Signed-off-by: "Darrick J. Wong" --- fs/iomap/direct-io.c | 4 ++++ include/linux/iomap.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index dd521f4edf55ac..f572be18490b0a 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -100,6 +100,10 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) if (dops && dops->end_io) ret = dops->end_io(iocb, dio->size, ret, dio->flags); + if (dio->error && dops && dops->ioerror) + dops->ioerror(file_inode(iocb->ki_filp), + (dio->flags & IOMAP_DIO_WRITE) ? WRITE : READ, + offset, dio->size, dio->error); if (likely(!ret)) { ret = dio->size; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index afa0917cf43705..69c8b45bd9b935 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -439,6 +439,8 @@ struct iomap_dio_ops { unsigned flags); void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, loff_t file_offset); + void (*ioerror)(struct inode *inode, int direction, loff_t pos, + u64 len, int error); /* * Filesystems wishing to attach private information to a direct io bio From patchwork Tue Dec 31 23:40:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924042 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69DE113FD72 for ; Tue, 31 Dec 2024 23:40:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688441; cv=none; b=k7Qf0gu8tNbD7wN+Pwax2EF6oO4p15ayjh6PBB3DiDuEtUJz16mOfDU56w7EY+7IWDxFPGHPAamlqgCI3lFQi17o7SEfuH8ws6qPG8q1kEp2nhCOv5jLZvUzFo7MxfYfbmOYzcNxgQvmTTkaTSTQd4ZShqS7SLxfv58G1jFOMuk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688441; c=relaxed/simple; bh=QoZyaTJB/V8pCUTDOox5BiFb5xSzVG4VKgm7A6gQYoo=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=CrVWP3q0/u8d1KMmN9aw7u94tp9hH7sdk1fXCDch6Eq95k4HeWGo1BPR4cY/6fki96krlb7VUpCyb0S89izTh4NKkbFvYj/+2vS06WWGPCDT9OkOIH4BSLwO9jyPfKNAooEkqGQZfRYQx86AaxpKMG0iB7cAc5tOofMjVmPXM8U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RRrStCIp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RRrStCIp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 447B7C4CED2; Tue, 31 Dec 2024 23:40:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688441; bh=QoZyaTJB/V8pCUTDOox5BiFb5xSzVG4VKgm7A6gQYoo=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=RRrStCIpeM/kgXRoML6tlCvuBcA/QJBhY0P3ySBBuurlCrGAHwkevlB0Vx3nGSFRC XQMs/z/Yly5jngnXxb51HgqweE0sIEaHEkYDiPWavG7yxdOqJnvW+VWrWVjIOv4vsM 73RvVMMOadJKHCA9E/j5M/2TZGT4aseE7W6aPz6wHkHoZBxgYSFoOkuGFAITt/910+ c3323pYcDpXVKSvr/niSc6D8nNNbhife3LPfOpyP6TC2vomiRNBOz2SaLs9iG51OUO dgvyJUwby/IJu4kWQF4NvQBtbUKCw9uIBG9eOQveZC8cOzsHEFwICSp4I143c6uzH3 /hiTugXyP8qsQ== Date: Tue, 31 Dec 2024 15:40:40 -0800 Subject: [PATCH 07/16] xfs: create file io error hooks From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754863.2704911.11267943332732791949.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create hooks within XFS to deliver IO errors to callers. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_aops.c | 2 + fs/xfs/xfs_file.c | 167 ++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_file.h | 36 +++++++++++ fs/xfs/xfs_mount.h | 3 + fs/xfs/xfs_super.c | 1 5 files changed, 208 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 4319d0488f2146..7892b794085251 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -21,6 +21,7 @@ #include "xfs_error.h" #include "xfs_zone_alloc.h" #include "xfs_rtgroup.h" +#include "xfs_file.h" struct xfs_writepage_ctx { struct iomap_writepage_ctx ctx; @@ -722,6 +723,7 @@ const struct address_space_operations xfs_address_space_operations = { .is_partially_uptodate = iomap_is_partially_uptodate, .error_remove_folio = generic_error_remove_folio, .swap_activate = xfs_iomap_swapfile_activate, + .ioerror = xfs_vm_ioerror, }; const struct address_space_operations xfs_dax_aops = { diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index ceb7936e5fd9a3..cbeb60582cb15f 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -230,6 +230,169 @@ xfs_ilock_iocb_for_write( return 0; } +#ifdef CONFIG_XFS_LIVE_HOOKS +DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_file_ioerror_hooks_switch); + +void +xfs_file_ioerror_hook_disable(void) +{ + xfs_hooks_switch_off(&xfs_file_ioerror_hooks_switch); +} + +void +xfs_file_ioerror_hook_enable(void) +{ + xfs_hooks_switch_on(&xfs_file_ioerror_hooks_switch); +} + +struct xfs_file_ioerror { + struct work_struct work; + struct xfs_mount *mp; + xfs_ino_t ino; + loff_t pos; + u64 len; + u32 gen; + int error; + enum xfs_file_ioerror_type type; +}; + +/* Call downstream hooks for a file io error update. */ +STATIC void +xfs_file_report_ioerror( + struct work_struct *work) +{ + struct xfs_file_ioerror *ioerr; + + ioerr = container_of(work, struct xfs_file_ioerror, work); + + if (xfs_hooks_switched_on(&xfs_file_ioerror_hooks_switch)) { + struct xfs_file_ioerror_params p = { + .ino = ioerr->ino, + .gen = ioerr->gen, + .pos = ioerr->pos, + .len = ioerr->len, + }; + struct xfs_mount *mp = ioerr->mp; + + xfs_hooks_call(&mp->m_file_ioerror_hooks, ioerr->type, &p); + } + + kfree(ioerr); +} + +/* Queue a directio io error notification. */ +STATIC void +xfs_dio_ioerror( + struct inode *inode, + int direction, + loff_t pos, + u64 len, + int error) +{ + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + struct xfs_file_ioerror *ioerr; + + if (xfs_hooks_switched_on(&xfs_file_ioerror_hooks_switch)) { + ioerr = kzalloc(sizeof(*ioerr), GFP_ATOMIC); + if (!ioerr) { + xfs_err(mp, + "lost ioerror report for ino 0x%llx %s pos 0x%llx len 0x%llx error %d", + ip->i_ino, + direction == WRITE ? "WRITE" : "READ", + pos, len, error); + return; + } + + INIT_WORK(&ioerr->work, xfs_file_report_ioerror); + ioerr->mp = mp; + ioerr->ino = ip->i_ino; + ioerr->gen = VFS_I(ip)->i_generation; + ioerr->pos = pos; + ioerr->len = len; + if (direction == WRITE) + ioerr->type = XFS_FILE_IOERROR_DIRECT_WRITE; + else + ioerr->type = XFS_FILE_IOERROR_DIRECT_READ; + ioerr->error = error; + queue_work(mp->m_unwritten_workqueue, &ioerr->work); + } +} + +/* Queue a buffered io error notification. */ +void +xfs_vm_ioerror( + struct address_space *mapping, + int direction, + loff_t pos, + u64 len, + int error) +{ + struct inode *inode = mapping->host; + struct xfs_inode *ip = XFS_I(inode); + struct xfs_mount *mp = ip->i_mount; + struct xfs_file_ioerror *ioerr; + + if (xfs_hooks_switched_on(&xfs_file_ioerror_hooks_switch)) { + ioerr = kzalloc(sizeof(*ioerr), GFP_ATOMIC); + if (!ioerr) { + xfs_err(mp, + "lost ioerror report for ino 0x%llx %s pos 0x%llx len 0x%llx error %d", + ip->i_ino, + direction == WRITE ? "WRITE" : "READ", + pos, len, error); + return; + } + + INIT_WORK(&ioerr->work, xfs_file_report_ioerror); + ioerr->mp = mp; + ioerr->ino = ip->i_ino; + ioerr->gen = VFS_I(ip)->i_generation; + ioerr->pos = pos; + ioerr->len = len; + if (direction == WRITE) + ioerr->type = XFS_FILE_IOERROR_BUFFERED_WRITE; + else + ioerr->type = XFS_FILE_IOERROR_BUFFERED_READ; + ioerr->error = error; + queue_work(mp->m_unwritten_workqueue, &ioerr->work); + } +} + +/* Call the specified function after a file io error. */ +int +xfs_file_ioerror_hook_add( + struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook) +{ + return xfs_hooks_add(&mp->m_file_ioerror_hooks, &hook->ioerror_hook); +} + +/* Stop calling the specified function after a file io error. */ +void +xfs_file_ioerror_hook_del( + struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook) +{ + xfs_hooks_del(&mp->m_file_ioerror_hooks, &hook->ioerror_hook); +} + +/* Configure file io error update hook functions. */ +void +xfs_file_ioerror_hook_setup( + struct xfs_file_ioerror_hook *hook, + notifier_fn_t mod_fn) +{ + xfs_hook_setup(&hook->ioerror_hook, mod_fn); +} +#else +# define xfs_dio_ioerror NULL +#endif /* CONFIG_XFS_LIVE_HOOKS */ + +static const struct iomap_dio_ops xfs_dio_read_ops = { + .ioerror = xfs_dio_ioerror, +}; + STATIC ssize_t xfs_file_dio_read( struct kiocb *iocb, @@ -248,7 +411,8 @@ xfs_file_dio_read( ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_SHARED); if (ret) return ret; - ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, 0, NULL, 0); + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, &xfs_dio_read_ops, + 0, NULL, 0); xfs_iunlock(ip, XFS_IOLOCK_SHARED); return ret; @@ -769,6 +933,7 @@ xfs_dio_write_end_io( static const struct iomap_dio_ops xfs_dio_write_ops = { .end_io = xfs_dio_write_end_io, + .ioerror = xfs_dio_ioerror, }; static void diff --git a/fs/xfs/xfs_file.h b/fs/xfs/xfs_file.h index c9d50699baba85..38c546cd498a52 100644 --- a/fs/xfs/xfs_file.h +++ b/fs/xfs/xfs_file.h @@ -17,4 +17,40 @@ int xfs_file_unshare_at(struct xfs_inode *ip, loff_t pos); long xfs_ioc_map_freesp(struct file *file, struct xfs_map_freesp __user *argp); +enum xfs_file_ioerror_type { + XFS_FILE_IOERROR_BUFFERED_READ, + XFS_FILE_IOERROR_BUFFERED_WRITE, + XFS_FILE_IOERROR_DIRECT_READ, + XFS_FILE_IOERROR_DIRECT_WRITE, +}; + +struct xfs_file_ioerror_params { + xfs_ino_t ino; + loff_t pos; + u64 len; + u32 gen; + int error; +}; + +#ifdef CONFIG_XFS_LIVE_HOOKS +struct xfs_file_ioerror_hook { + struct xfs_hook ioerror_hook; +}; + +void xfs_file_ioerror_hook_disable(void); +void xfs_file_ioerror_hook_enable(void); + +int xfs_file_ioerror_hook_add(struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook); +void xfs_file_ioerror_hook_del(struct xfs_mount *mp, + struct xfs_file_ioerror_hook *hook); +void xfs_file_ioerror_hook_setup(struct xfs_file_ioerror_hook *hook, + notifier_fn_t mod_fn); + +void xfs_vm_ioerror(struct address_space *mapping, int direction, loff_t pos, + u64 len, int error); +#else +# define xfs_vm_ioerror NULL +#endif /* CONFIG_XFS_LIVE_HOOKS */ + #endif /* __XFS_FILE_H__ */ diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 3fcfdaaf199315..10b4ff3548601e 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -349,6 +349,9 @@ typedef struct xfs_mount { /* Hook to feed media error events to a daemon. */ struct xfs_hooks m_media_error_hooks; + + /* Hook to feed file io error events to a daemon. */ + struct xfs_hooks m_file_ioerror_hooks; } xfs_mount_t; #define M_IGEO(mp) (&(mp)->m_ino_geo) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a49082159faae8..df6afcf8840948 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -2185,6 +2185,7 @@ xfs_init_fs_context( xfs_hooks_init(&mp->m_shutdown_hooks); xfs_hooks_init(&mp->m_health_update_hooks); xfs_hooks_init(&mp->m_media_error_hooks); + xfs_hooks_init(&mp->m_file_ioerror_hooks); fc->s_fs_info = mp; fc->ops = &xfs_context_ops; From patchwork Tue Dec 31 23:40:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924043 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 000A513FD72 for ; Tue, 31 Dec 2024 23:40:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688457; cv=none; b=U/LDqlAmFSk006kbnYGgCrJ3XPPs0+jIAtU/rUR2OYCMuhGDjjp0er4GHqsS3DddxkinE6SBu6LROZpWtOejHI/SFjmXZ9NmTQHKKIdXPF7XBj08VAV6yhp0+h2us4UdBDdI2mwyaMAIYOhlcBUogcSJMW7Yr6mgjdLWGTvovvs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688457; c=relaxed/simple; bh=JC2+lU0G7LDrwOo73vXseUN4rrBHZhrZHeqRdT5a8do=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eWmY/iiH9kdlBahHzlnmwRhSkK8RnKb9BNuVZQFiwEQmp1kn+4x6vHfhf3Ju58bQnktat4DSxuF4ctlm7ma3MwJWCBgOaMU+Ze1yvemSgwDy28ZtWsmJaFi6tgLnEgKHihXKwHTF4XlTU5VIvWq/jQRNSnvsremNcI9G6gJjc4s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WLxAfV15; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WLxAfV15" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CC445C4CED2; Tue, 31 Dec 2024 23:40:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688456; bh=JC2+lU0G7LDrwOo73vXseUN4rrBHZhrZHeqRdT5a8do=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=WLxAfV15KBaiJDXLusWBkBuYizKZRMtIiTm2ZDVhuC0cVukoEx+jJjHlgClxN9vEn rDut2BOZ9xbbAEv2Ygo4oRlSxdiiMKF5yeHO5y89w0bBdRju2t1HdmY1RHY676S5Gb wfzSf8OF00ZC0my7bdSq51INDqmHNTx7rgXHqrMDWJPuyOXk7ZpJwXemXwblBC6meI lwGVzTGMkUdXnvan5UVXHLtV8d70iHe8muTnjsrll7QEgYzuDtqiLGdV4fAGSGvQ0A vujVHyE5ib2MbMUWU4Si1zWcEWR0l2JCimiSQe4KWJypcMsVE3nzzUOsoSzcW/mrrJ ZMpGYfnjaQR8w== Date: Tue, 31 Dec 2024 15:40:56 -0800 Subject: [PATCH 08/16] xfs: create a special file to pass filesystem health to userspace From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754880.2704911.15158852399328244529.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create an ioctl that installs a file descriptor backed by an anon_inode file that will convey filesystem health events to userspace. Signed-off-by: "Darrick J. Wong" --- fs/xfs/Kconfig | 8 +++ fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 8 +++ fs/xfs/xfs_healthmon.c | 145 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_healthmon.h | 16 +++++ fs/xfs/xfs_ioctl.c | 4 + 6 files changed, 182 insertions(+) create mode 100644 fs/xfs/xfs_healthmon.c create mode 100644 fs/xfs/xfs_healthmon.h diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index 5700bc671a0e92..9d061a8c2786fe 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -120,6 +120,14 @@ config XFS_RT If unsure, say N. +config XFS_HEALTH_MONITOR + bool "Report filesystem health events to userspace" + depends on XFS_FS + select XFS_LIVE_HOOKS + default y + help + Report health events to userspace programs. + config XFS_DRAIN_INTENTS bool select JUMP_LABEL if HAVE_ARCH_JUMP_LABEL diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 4c59d43c77089e..94a9dc7aa7a1d5 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -158,6 +158,7 @@ xfs-$(CONFIG_XFS_DRAIN_INTENTS) += xfs_drain.o xfs-$(CONFIG_XFS_LIVE_HOOKS) += xfs_hooks.o xfs-$(CONFIG_XFS_MEMORY_BUFS) += xfs_buf_mem.o xfs-$(CONFIG_XFS_BTREE_IN_MEM) += libxfs/xfs_btree_mem.o +xfs-$(CONFIG_XFS_HEALTH_MONITOR) += xfs_healthmon.o # online scrub/repair ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index f4128dbdf3b9a2..d1a81b02a1a3f3 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1100,6 +1100,13 @@ struct xfs_map_freesp { __u64 pad; /* must be zero */ }; +struct xfs_health_monitor { + __u64 flags; /* flags */ + __u8 format; /* output format */ + __u8 pad1[7]; /* zeroes */ + __u64 pad2[2]; /* zeroes */ +}; + /* * ioctl commands that are used by Linux filesystems */ @@ -1141,6 +1148,7 @@ struct xfs_map_freesp { #define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 65, struct xfs_rtgroup_geometry) #define XFS_IOC_GETFSREFCOUNTS _IOWR('X', 66, struct xfs_getfsrefs_head) #define XFS_IOC_MAP_FREESP _IOW ('X', 67, struct xfs_map_freesp) +#define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c new file mode 100644 index 00000000000000..c5ce5699373c63 --- /dev/null +++ b/fs/xfs/xfs_healthmon.c @@ -0,0 +1,145 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2024-2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_inode.h" +#include "xfs_trace.h" +#include "xfs_ag.h" +#include "xfs_btree.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_quota_defs.h" +#include "xfs_rtgroup.h" +#include "xfs_healthmon.h" + +#include +#include +#include + +/* + * Live Health Monitoring + * ====================== + * + * Autonomous self-healing of XFS filesystems requires a means for the kernel + * to send filesystem health events to a monitoring daemon in userspace. To + * accomplish this, we establish a thread_with_file kthread object to handle + * translating internal events about filesystem health into a format that can + * be parsed easily by userspace. Then we hook various parts of the filesystem + * to supply those internal events to the kthread. Userspace reads events + * from the file descriptor returned by the ioctl. + * + * The healthmon abstraction has a weak reference to the host filesystem mount + * so that the queueing and processing of the events do not pin the mount and + * cannot slow down the main filesystem. The healthmon object can exist past + * the end of the filesystem mount. + */ + +struct xfs_healthmon { + struct xfs_mount *mp; +}; + +/* + * Convey queued event data to userspace. First copy any remaining bytes in + * the outbuf, then format the oldest event into the outbuf and copy that too. + */ +STATIC ssize_t +xfs_healthmon_read_iter( + struct kiocb *iocb, + struct iov_iter *to) +{ + return -EIO; +} + +/* Free the health monitoring information. */ +STATIC int +xfs_healthmon_release( + struct inode *inode, + struct file *file) +{ + struct xfs_healthmon *hm = file->private_data; + + kfree(hm); + + return 0; +} + +/* Validate ioctl parameters. */ +static inline bool +xfs_healthmon_validate( + const struct xfs_health_monitor *hmo) +{ + if (hmo->flags) + return false; + if (hmo->format) + return false; + if (memchr_inv(&hmo->pad1, 0, sizeof(hmo->pad1))) + return false; + if (memchr_inv(&hmo->pad2, 0, sizeof(hmo->pad2))) + return false; + return true; +} + +static const struct file_operations xfs_healthmon_fops = { + .owner = THIS_MODULE, + .read_iter = xfs_healthmon_read_iter, + .release = xfs_healthmon_release, +}; + +/* + * Create a health monitoring file. Returns an index to the fd table or a + * negative errno. + */ +long +xfs_ioc_health_monitor( + struct xfs_mount *mp, + struct xfs_health_monitor __user *arg) +{ + struct xfs_health_monitor hmo; + struct xfs_healthmon *hm; + char *name; + int fd; + int ret; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&hmo, arg, sizeof(hmo))) + return -EFAULT; + + if (!xfs_healthmon_validate(&hmo)) + return -EINVAL; + + hm = kzalloc(sizeof(*hm), GFP_KERNEL); + if (!hm) + return -ENOMEM; + hm->mp = mp; + + /* Set up VFS file and file descriptor. */ + name = kasprintf(GFP_KERNEL, "XFS (%s): healthmon", mp->m_super->s_id); + if (!name) { + ret = -ENOMEM; + goto out_hm; + } + + fd = anon_inode_getfd(name, &xfs_healthmon_fops, hm, + O_CLOEXEC | O_RDONLY); + kvfree(name); + if (fd < 0) { + ret = fd; + goto out_hm; + } + + return fd; + +out_hm: + kfree(hm); + return ret; +} diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h new file mode 100644 index 00000000000000..07126e39281a0c --- /dev/null +++ b/fs/xfs/xfs_healthmon.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2024-2025 Oracle. All Rights Reserved. + * Author: Darrick J. Wong + */ +#ifndef __XFS_HEALTHMON_H__ +#define __XFS_HEALTHMON_H__ + +#ifdef CONFIG_XFS_HEALTH_MONITOR +long xfs_ioc_health_monitor(struct xfs_mount *mp, + struct xfs_health_monitor __user *arg); +#else +# define xfs_ioc_health_monitor(mp, hmo) (-ENOTTY) +#endif /* CONFIG_XFS_HEALTH_MONITOR */ + +#endif /* __XFS_HEALTHMON_H__ */ diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 092a3699ff9e75..6c7a30128c7bf6 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -42,6 +42,7 @@ #include "xfs_exchrange.h" #include "xfs_handle.h" #include "xfs_rtgroup.h" +#include "xfs_healthmon.h" #include #include @@ -1434,6 +1435,9 @@ xfs_file_ioctl( case XFS_IOC_MAP_FREESP: return xfs_ioc_map_freesp(filp, arg); + case XFS_IOC_HEALTH_MONITOR: + return xfs_ioc_health_monitor(mp, arg); + default: return -ENOTTY; } From patchwork Tue Dec 31 23:41:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924044 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98F0613FD72 for ; Tue, 31 Dec 2024 23:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688472; cv=none; b=dgd4ZgaWBr9kINK/I2YscahxNTrcAXDYp3BlIEQ3Oo2lrtX5mJ2r4fWV0nLow6yGsfcG5aMh+I3t3R/CgsxPcMfTOpqBuVMKygjx/Q1FlRStSCsoHHlQ1HWx8XpDr78/F8ng0rn56uYdr9dli+SIauyEDRhdEg//uKropDUAJmU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688472; c=relaxed/simple; bh=RF9YYV77esfxM1cNYnRt6V/i1GXT6uFx+hBbZhBbBFI=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=N/aPgkaRhIKU7NFd0vJPm2zQL8zy+j/qhTI4LN46VrovTHQImo7BWkjGIaSvMTG66i4wcTu019121loTEsItkB+hCPJ18GIoMl/TDY05hhSwVrsEyDk2ule36R4WuVel6BNJyACZmAQPflcZzSG0lzS+rvAqS4UUBhQtYK0Wr0Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ugueCCkS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ugueCCkS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6F8BEC4CED2; Tue, 31 Dec 2024 23:41:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688472; bh=RF9YYV77esfxM1cNYnRt6V/i1GXT6uFx+hBbZhBbBFI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=ugueCCkSf3IP3vAoy8Qm89MPvQsrFMrN9HHs9c31mkgC5qI8o0vEFeILljp4Ehsfx twZ6DnqISDjYAybzMv6F2hIRbAiBBvq9lUcYHUnn3Sr7lGwsRg6BG3PhbRguoHvpK7 5V7eBJL/jed0g4cNW6kfQu1EEl1fMlocamX/MENppdXZIccpQM7O3N/Cm+nRdpv3OZ RB039bsWu8PVWV5+61fo+Ah+iDbLljS3z2vhRagsJc+mBkwCxXwJypFphQRkEhGedT SMrVcl199USN3bCHTSLIIPzG69ba/9jRjIzKGtZOtLeudcJmZ9myraIcmoMD/fJlwv ZJu7Y4sTwae4g== Date: Tue, 31 Dec 2024 15:41:11 -0800 Subject: [PATCH 09/16] xfs: create event queuing, formatting, and discovery infrastructure From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754898.2704911.17397880399097729677.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Create the basic infrastructure that we need to report health events to userspace. We need a compact form for recording critical information about an event and queueing them; a means to notice that we've lost some events; and a means to format the events into something that userspace can handle. Here, we've chosen json to export information to userspace. The structured key-value nature of json gives us enormous flexibility to modify the schema of what we'll send to userspace because we can add new keys at any time. Userspace can use whatever json parsers are available to consume the events and will not be confused by keys they don't recognize. Note that we do NOT allow sending json back to the kernel, nor is there any intent to do that. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_fs.h | 8 fs/xfs/libxfs/xfs_healthmon.schema.json | 63 ++++ fs/xfs/xfs_healthmon.c | 542 +++++++++++++++++++++++++++++++ fs/xfs/xfs_healthmon.h | 24 + fs/xfs/xfs_linux.h | 3 fs/xfs/xfs_trace.c | 2 fs/xfs/xfs_trace.h | 152 +++++++++ 7 files changed, 788 insertions(+), 6 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_healthmon.schema.json diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index d1a81b02a1a3f3..d7404e6efd866d 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1107,6 +1107,14 @@ struct xfs_health_monitor { __u64 pad2[2]; /* zeroes */ }; +/* Return all health status events, not just deltas */ +#define XFS_HEALTH_MONITOR_VERBOSE (1ULL << 0) + +#define XFS_HEALTH_MONITOR_ALL (XFS_HEALTH_MONITOR_VERBOSE) + +/* Return events in JSON format */ +#define XFS_HEALTH_MONITOR_FMT_JSON (1) + /* * ioctl commands that are used by Linux filesystems */ diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json new file mode 100644 index 00000000000000..9772efe25f193d --- /dev/null +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -0,0 +1,63 @@ +{ + "$comment": [ + "SPDX-License-Identifier: GPL-2.0-or-later", + "Copyright (c) 2024-2025 Oracle. All Rights Reserved.", + "Author: Darrick J. Wong ", + "", + "This schema file describes the format of the json objects", + "readable from the fd returned by the XFS_IOC_HEALTHMON", + "ioctl." + ], + + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/fs/xfs/libxfs/xfs_healthmon.schema.json", + + "title": "XFS Health Monitoring Events", + + "$comment": "Events must be one of the following types:", + "oneOf": [ + { + "$ref": "#/$events/lost" + } + ], + + "$comment": "Simple data types are defined here.", + "$defs": { + "time_ns": { + "title": "Time of Event", + "description": "Timestamp of the event, in nanoseconds since the Unix epoch.", + "type": "integer" + } + }, + + "$comment": "Event types are defined here.", + "$events": { + "lost": { + "title": "Health Monitoring Events Lost", + "$comment": [ + "Previous health monitoring events were", + "dropped due to memory allocation failures", + "or queue limits." + ], + "type": "object", + + "properties": { + "type": { + "const": "lost" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + } + }, + + "required": [ + "type", + "time_ns", + "domain" + ] + } + } +} diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index c5ce5699373c63..499f6aab9bdbf3 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -40,12 +40,417 @@ * so that the queueing and processing of the events do not pin the mount and * cannot slow down the main filesystem. The healthmon object can exist past * the end of the filesystem mount. + * + * Please see the xfs_healthmon.schema.json file for a description of the + * format of the json events that are conveyed to userspace. */ +/* Allow this many events to build up in memory per healthmon fd. */ +#define XFS_HEALTHMON_MAX_EVENTS \ + (32768 / sizeof(struct xfs_healthmon_event)) + +struct flag_string { + unsigned int mask; + const char *str; +}; + struct xfs_healthmon { + /* lock for mp and eventlist */ + struct mutex lock; + + /* waiter for signalling the arrival of events */ + struct wait_queue_head wait; + + /* list of event objects */ + struct xfs_healthmon_event *first_event; + struct xfs_healthmon_event *last_event; + struct xfs_mount *mp; + + /* number of events */ + unsigned int events; + + /* + * Buffer for formatting events. New buffer data are appended to the + * end of the seqbuf, and outpos is used to determine where to start + * a copy_iter. Both are protected by inode_lock. + */ + struct seq_buf outbuf; + size_t outpos; + + /* do we want all events? */ + bool verbose; + + /* did we lose an event? */ + bool lost_prev_event; }; +/* Remove an event from the head of the list. */ +static inline void +xfs_healthmon_free_head( + struct xfs_healthmon *hm, + struct xfs_healthmon_event *event) +{ + struct xfs_healthmon_event *head; + + mutex_lock(&hm->lock); + head = hm->first_event; + if (head != event) { + ASSERT(hm->first_event == event); + mutex_unlock(&hm->lock); + return; + } + + if (hm->last_event == head) + hm->last_event = NULL; + hm->first_event = head->next; + hm->events--; + mutex_unlock(&hm->lock); + + trace_xfs_healthmon_pop(hm->mp, head); + kfree(event); +} + +/* Push an event onto the end of the list. */ +static inline int +xfs_healthmon_push( + struct xfs_healthmon *hm, + struct xfs_healthmon_event *event) +{ + /* + * If the queue is already full, remember the fact that we lost events. + * This doesn't apply to "event lost" events; those always go through + * because there should only be one at the very end of the queue. + */ + if (hm->events >= XFS_HEALTHMON_MAX_EVENTS && + event->type != XFS_HEALTHMON_LOST) { + trace_xfs_healthmon_lost_event(hm->mp); + hm->lost_prev_event = true; + return -ENOMEM; + } + + if (!hm->first_event) + hm->first_event = event; + if (hm->last_event) + hm->last_event->next = event; + hm->last_event = event; + event->next = NULL; + hm->events++; + wake_up(&hm->wait); + + trace_xfs_healthmon_push(hm->mp, event); + + return 0; +} + +/* Create a new event or record that we failed. */ +static struct xfs_healthmon_event * +xfs_healthmon_alloc( + struct xfs_healthmon *hm, + enum xfs_healthmon_type type, + enum xfs_healthmon_domain domain) +{ + struct timespec64 now; + struct xfs_healthmon_event *event; + + event = kzalloc(sizeof(*event), GFP_NOFS); + if (!event) { + trace_xfs_healthmon_lost_event(hm->mp); + hm->lost_prev_event = true; + return NULL; + } + + event->type = type; + event->domain = domain; + ktime_get_coarse_real_ts64(&now); + event->time_ns = (now.tv_sec * NSEC_PER_SEC) + now.tv_nsec; + + return event; +} + +/* + * Before we accept an event notification from a live update hook, we need to + * clear out any previously lost events. + */ +static inline int +xfs_healthmon_start_live_update( + struct xfs_healthmon *hm) +{ + struct xfs_healthmon_event *event; + + /* + * If we previously lost an event or the queue is full, try to queue + * a notification about lost events. + */ + if (!hm->lost_prev_event && hm->events != XFS_HEALTHMON_MAX_EVENTS) + return 0; + + /* + * A previous invocation of the live update hook could not allocate + * any memory at all. If the last event on the list is already a + * notification of lost events, we're done. + */ + if (hm->last_event && hm->last_event->type == XFS_HEALTHMON_LOST) + return 0; + + /* + * There are no events or the last one wasn't about lost events. Try + * to allocate a new one to note the lost events. + */ + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_LOST, + XFS_HEALTHMON_MOUNT); + if (!event) + return -ENOMEM; + + hm->lost_prev_event = false; + xfs_healthmon_push(hm, event); + return 0; +} + +/* Render the health update type as a string. */ +STATIC const char * +xfs_healthmon_typestring( + const struct xfs_healthmon_event *event) +{ + static const char *type_strings[] = { + [XFS_HEALTHMON_LOST] = "lost", + }; + + if (event->type >= ARRAY_SIZE(type_strings)) + return "?"; + + return type_strings[event->type]; +} + +/* Render the health domain as a string. */ +STATIC const char * +xfs_healthmon_domstring( + const struct xfs_healthmon_event *event) +{ + static const char *dom_strings[] = { + [XFS_HEALTHMON_MOUNT] = "mount", + }; + + if (event->domain >= ARRAY_SIZE(dom_strings)) + return "?"; + + return dom_strings[event->domain]; +} + +/* Convert a flags bitmap into a jsonable string. */ +static inline int +xfs_healthmon_format_flags( + struct seq_buf *outbuf, + const struct flag_string *strings, + size_t nr_strings, + unsigned int flags) +{ + const struct flag_string *p; + ssize_t ret; + unsigned int i; + bool first = true; + + for (i = 0, p = strings; i < nr_strings; i++, p++) { + if (!(p->mask & flags)) + continue; + + ret = seq_buf_printf(outbuf, "%s\"%s\"", + first ? "" : ", ", p->str); + if (ret < 0) + return ret; + + first = false; + flags &= ~p->mask; + } + + for (i = 0; flags != 0 && i < sizeof(flags) * NBBY; i++) { + if (!(flags & (1U << i))) + continue; + + /* json doesn't support hexadecimal notation */ + ret = seq_buf_printf(outbuf, "%s%u", + first ? "" : ", ", (1U << i)); + if (ret < 0) + return ret; + + first = false; + } + + return 0; +} + +/* Convert the event mask into a jsonable string. */ +static inline int +__xfs_healthmon_format_mask( + struct seq_buf *outbuf, + const char *descr, + const struct flag_string *strings, + size_t nr_strings, + unsigned int mask) +{ + ssize_t ret; + + ret = seq_buf_printf(outbuf, " \"%s\": [", descr); + if (ret < 0) + return ret; + + ret = xfs_healthmon_format_flags(outbuf, strings, nr_strings, mask); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, "],\n"); +} + +#define xfs_healthmon_format_mask(o, d, s, m) \ + __xfs_healthmon_format_mask((o), (d), (s), ARRAY_SIZE(s), (m)) + +static inline void +xfs_healthmon_reset_outbuf( + struct xfs_healthmon *hm) +{ + hm->outpos = 0; + seq_buf_clear(&hm->outbuf); +} + +/* + * Format an event into json. Returns 0 if we formatted the event. If + * formatting the event overflows the buffer, returns -1 with the seqbuf len + * unchanged. + */ +STATIC int +xfs_healthmon_format( + struct xfs_healthmon *hm, + const struct xfs_healthmon_event *event) +{ + struct seq_buf *outbuf = &hm->outbuf; + size_t old_seqlen = outbuf->len; + int ret; + + trace_xfs_healthmon_format(hm->mp, event); + + ret = seq_buf_printf(outbuf, "{\n"); + if (ret < 0) + goto overrun; + + ret = seq_buf_printf(outbuf, " \"type\": \"%s\",\n", + xfs_healthmon_typestring(event)); + if (ret < 0) + goto overrun; + + ret = seq_buf_printf(outbuf, " \"domain\": \"%s\",\n", + xfs_healthmon_domstring(event)); + if (ret < 0) + goto overrun; + + switch (event->type) { + case XFS_HEALTHMON_LOST: + /* empty */ + break; + default: + break; + } + + switch (event->domain) { + case XFS_HEALTHMON_MOUNT: + /* empty */ + break; + } + if (ret < 0) + goto overrun; + + /* The last element in the json must not have a trailing comma. */ + ret = seq_buf_printf(outbuf, " \"time_ns\": %llu\n", + event->time_ns); + if (ret < 0) + goto overrun; + + ret = seq_buf_printf(outbuf, "}\n"); + if (ret < 0) + goto overrun; + + ASSERT(!seq_buf_has_overflowed(outbuf)); + return 0; +overrun: + /* + * We overflowed the buffer and could not format the event. Reset the + * seqbuf and tell the caller not to delete the event. + */ + trace_xfs_healthmon_format_overflow(hm->mp, event); + outbuf->len = old_seqlen; + return -1; +} + +/* How many bytes are waiting in the outbuf to be copied? */ +static inline size_t +xfs_healthmon_outbuf_bytes( + struct xfs_healthmon *hm) +{ + unsigned int used = seq_buf_used(&hm->outbuf); + + if (used > hm->outpos) + return used - hm->outpos; + return 0; +} + +/* + * Do we have something for userspace to do? This can mean unmount events, + * events pending in the queue, or pending bytes in the outbuf. + */ +static inline bool +xfs_healthmon_has_eventdata( + struct xfs_healthmon *hm) +{ + return hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; +} + +/* Try to copy the rest of the outbuf to the iov iter. */ +STATIC ssize_t +xfs_healthmon_copybuf( + struct xfs_healthmon *hm, + struct iov_iter *to) +{ + size_t to_copy; + size_t w = 0; + + trace_xfs_healthmon_copybuf(hm->mp, to, &hm->outbuf, hm->outpos); + + to_copy = xfs_healthmon_outbuf_bytes(hm); + if (to_copy) { + w = copy_to_iter(hm->outbuf.buffer + hm->outpos, to_copy, to); + if (!w) + return -EFAULT; + + hm->outpos += w; + } + + /* + * Nothing left to copy? Reset the seqbuf pointers and outbuf to the + * start since there's no live data in the buffer. + */ + if (xfs_healthmon_outbuf_bytes(hm) == 0) + xfs_healthmon_reset_outbuf(hm); + return w; +} + +/* + * See if there's an event waiting for us. If the fs is no longer mounted, + * don't bother sending any more events. + */ +static inline struct xfs_healthmon_event * +xfs_healthmon_peek( + struct xfs_healthmon *hm) +{ + struct xfs_healthmon_event *event; + + mutex_lock(&hm->lock); + if (hm->mp) + event = hm->first_event; + else + event = NULL; + mutex_unlock(&hm->lock); + return event; +} + /* * Convey queued event data to userspace. First copy any remaining bytes in * the outbuf, then format the oldest event into the outbuf and copy that too. @@ -55,7 +460,112 @@ xfs_healthmon_read_iter( struct kiocb *iocb, struct iov_iter *to) { - return -EIO; + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + struct xfs_healthmon *hm = file->private_data; + struct xfs_healthmon_event *event; + size_t copied = 0; + ssize_t ret = 0; + + /* Wait for data to become available */ + if (!(file->f_flags & O_NONBLOCK)) { + ret = wait_event_interruptible(hm->wait, + xfs_healthmon_has_eventdata(hm)); + if (ret) + return ret; + } else if (!xfs_healthmon_has_eventdata(hm)) { + return -EAGAIN; + } + + /* Allocate formatting buffer up to 64k if necessary */ + if (hm->outbuf.size == 0) { + void *outbuf; + size_t bufsize = min(65536, max(PAGE_SIZE, + iov_iter_count(to))); + + outbuf = kzalloc(bufsize, GFP_KERNEL); + if (!outbuf) { + bufsize = PAGE_SIZE; + outbuf = kzalloc(bufsize, GFP_KERNEL); + if (!outbuf) + return -ENOMEM; + } + + inode_lock(inode); + if (hm->outbuf.size == 0) { + seq_buf_init(&hm->outbuf, outbuf, bufsize); + hm->outpos = 0; + } else { + kfree(outbuf); + } + } else { + inode_lock(inode); + } + + trace_xfs_healthmon_read_start(hm->mp, hm->events, hm->lost_prev_event); + + /* + * If there's anything left in the seqbuf, copy that before formatting + * more events. + */ + ret = xfs_healthmon_copybuf(hm, to); + if (ret < 0) + goto out_unlock; + copied += ret; + + while (iov_iter_count(to) > 0) { + /* Format the next events into the outbuf until it's full. */ + while ((event = xfs_healthmon_peek(hm)) != NULL) { + ret = xfs_healthmon_format(hm, event); + if (ret < 0) + break; + xfs_healthmon_free_head(hm, event); + } + /* Copy it to userspace */ + ret = xfs_healthmon_copybuf(hm, to); + if (ret <= 0) + break; + + copied += ret; + } + +out_unlock: + trace_xfs_healthmon_read_finish(hm->mp, hm->events, hm->lost_prev_event); + inode_unlock(inode); + return copied ?: ret; +} + +/* Poll for available events. */ +STATIC __poll_t +xfs_healthmon_poll( + struct file *file, + struct poll_table_struct *wait) +{ + struct xfs_healthmon *hm = file->private_data; + __poll_t mask = 0; + + poll_wait(file, &hm->wait, wait); + + if (xfs_healthmon_has_eventdata(hm)) + mask |= EPOLLIN; + return mask; +} + +/* Free all events */ +STATIC void +xfs_healthmon_free_events( + struct xfs_healthmon *hm) +{ + struct xfs_healthmon_event *event, *next; + + event = hm->first_event; + while (event != NULL) { + trace_xfs_healthmon_drop(hm->mp, event); + next = event->next; + kfree(event); + event = next; + } + hm->first_event = hm->last_event = NULL; } /* Free the health monitoring information. */ @@ -66,6 +576,14 @@ xfs_healthmon_release( { struct xfs_healthmon *hm = file->private_data; + trace_xfs_healthmon_release(hm->mp, hm->events, hm->lost_prev_event); + + wake_up_all(&hm->wait); + + mutex_destroy(&hm->lock); + xfs_healthmon_free_events(hm); + if (hm->outbuf.size) + kfree(hm->outbuf.buffer); kfree(hm); return 0; @@ -76,9 +594,9 @@ static inline bool xfs_healthmon_validate( const struct xfs_health_monitor *hmo) { - if (hmo->flags) + if (hmo->flags & ~XFS_HEALTH_MONITOR_ALL) return false; - if (hmo->format) + if (hmo->format != XFS_HEALTH_MONITOR_FMT_JSON) return false; if (memchr_inv(&hmo->pad1, 0, sizeof(hmo->pad1))) return false; @@ -90,6 +608,7 @@ xfs_healthmon_validate( static const struct file_operations xfs_healthmon_fops = { .owner = THIS_MODULE, .read_iter = xfs_healthmon_read_iter, + .poll = xfs_healthmon_poll, .release = xfs_healthmon_release, }; @@ -122,11 +641,18 @@ xfs_ioc_health_monitor( return -ENOMEM; hm->mp = mp; + seq_buf_init(&hm->outbuf, NULL, 0); + mutex_init(&hm->lock); + init_waitqueue_head(&hm->wait); + + if (hmo.flags & XFS_HEALTH_MONITOR_VERBOSE) + hm->verbose = true; + /* Set up VFS file and file descriptor. */ name = kasprintf(GFP_KERNEL, "XFS (%s): healthmon", mp->m_super->s_id); if (!name) { ret = -ENOMEM; - goto out_hm; + goto out_mutex; } fd = anon_inode_getfd(name, &xfs_healthmon_fops, hm, @@ -134,12 +660,16 @@ xfs_ioc_health_monitor( kvfree(name); if (fd < 0) { ret = fd; - goto out_hm; + goto out_mutex; } + trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); + return fd; -out_hm: +out_mutex: + mutex_destroy(&hm->lock); + xfs_healthmon_free_events(hm); kfree(hm); return ret; } diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 07126e39281a0c..606f205074495c 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -6,6 +6,30 @@ #ifndef __XFS_HEALTHMON_H__ #define __XFS_HEALTHMON_H__ +enum xfs_healthmon_type { + XFS_HEALTHMON_LOST, /* message lost */ +}; + +enum xfs_healthmon_domain { + XFS_HEALTHMON_MOUNT, /* affects the whole fs */ +}; + +struct xfs_healthmon_event { + struct xfs_healthmon_event *next; + + enum xfs_healthmon_type type; + enum xfs_healthmon_domain domain; + + uint64_t time_ns; + + union { + /* mount */ + struct { + unsigned int flags; + }; + }; +}; + #ifdef CONFIG_XFS_HEALTH_MONITOR long xfs_ioc_health_monitor(struct xfs_mount *mp, struct xfs_health_monitor __user *arg); diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 9a2221b4aa21ed..d13a5fa2d652ff 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -63,6 +63,9 @@ typedef __u32 xfs_nlink_t; #include #include #include +#ifdef CONFIG_XFS_HEALTH_MONITOR +# include +#endif #include #include diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 555fe76b4d853c..41a2ac85dc5fdf 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -52,6 +52,8 @@ #include "xfs_zone_alloc.h" #include "xfs_zone_priv.h" #include "xfs_fsrefs.h" +#include "xfs_health.h" +#include "xfs_healthmon.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 76f5d78b6a6e09..bd3b007d213fc6 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -106,6 +106,8 @@ struct xfs_open_zone; struct xfs_fsrefs; struct xfs_fsrefs_irec; struct xfs_rtgroup; +struct xfs_healthmon_event; +struct xfs_health_update_params; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -6077,6 +6079,156 @@ TRACE_EVENT(xfs_growfs_check_rtgeom, ); #endif /* CONFIG_XFS_RT */ +#ifdef CONFIG_XFS_HEALTH_MONITOR +TRACE_EVENT(xfs_healthmon_lost_event, + TP_PROTO(const struct xfs_mount *mp), + TP_ARGS(mp), + TP_STRUCT__entry( + __field(dev_t, dev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + ), + TP_printk("dev %d:%d", + MAJOR(__entry->dev), MINOR(__entry->dev)) +); + +#define XFS_HEALTHMON_FLAGS_STRINGS \ + { XFS_HEALTH_MONITOR_VERBOSE, "verbose" } +#define XFS_HEALTHMON_FMT_STRINGS \ + { XFS_HEALTH_MONITOR_FMT_JSON, "json" } + +TRACE_EVENT(xfs_healthmon_create, + TP_PROTO(const struct xfs_mount *mp, u64 flags, u8 format), + TP_ARGS(mp, flags, format), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(u64, flags) + __field(u8, format) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->flags = flags; + __entry->format = format; + ), + TP_printk("dev %d:%d flags %s format %s", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_flags(__entry->flags, "|", XFS_HEALTHMON_FLAGS_STRINGS), + __print_symbolic(__entry->format, XFS_HEALTHMON_FMT_STRINGS)) +); + +TRACE_EVENT(xfs_healthmon_copybuf, + TP_PROTO(const struct xfs_mount *mp, const struct iov_iter *iov, + const struct seq_buf *seqbuf, size_t outpos), + TP_ARGS(mp, iov, seqbuf, outpos), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(size_t, seqbuf_size) + __field(size_t, seqbuf_len) + __field(size_t, outpos) + __field(size_t, to_copy) + __field(size_t, iter_count) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->seqbuf_size = seqbuf->size; + __entry->seqbuf_len = seqbuf->len; + __entry->outpos = outpos; + __entry->to_copy = seqbuf->len - outpos; + __entry->iter_count = iov_iter_count(iov); + ), + TP_printk("dev %d:%d seqsize %zu seqlen %zu out_pos %zu to_copy %zu iter_count %zu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->seqbuf_size, + __entry->seqbuf_len, + __entry->outpos, + __entry->to_copy, + __entry->iter_count) +); + +DECLARE_EVENT_CLASS(xfs_healthmon_class, + TP_PROTO(const struct xfs_mount *mp, unsigned int events, bool lost_prev), + TP_ARGS(mp, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, events) + __field(bool, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d events %u lost_prev? %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->events, + __entry->lost_prev) +); +#define DEFINE_HEALTHMON_EVENT(name) \ +DEFINE_EVENT(xfs_healthmon_class, name, \ + TP_PROTO(const struct xfs_mount *mp, unsigned int events, bool lost_prev), \ + TP_ARGS(mp, events, lost_prev)) +DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_start); +DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_finish); +DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); +DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); + +#define XFS_HEALTHMON_TYPE_STRINGS \ + { XFS_HEALTHMON_LOST, "lost" } + +#define XFS_HEALTHMON_DOMAIN_STRINGS \ + { XFS_HEALTHMON_MOUNT, "mount" } + +TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); + +TRACE_DEFINE_ENUM(XFS_HEALTHMON_MOUNT); + +DECLARE_EVENT_CLASS(xfs_healthmon_event_class, + TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), + TP_ARGS(mp, event), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, type) + __field(unsigned int, domain) + __field(unsigned int, mask) + __field(unsigned long long, ino) + __field(unsigned int, gen) + __field(unsigned int, group) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->type = event->type; + __entry->domain = event->domain; + __entry->mask = 0; + __entry->group = 0; + __entry->ino = 0; + __entry->gen = 0; + switch (__entry->domain) { + case XFS_HEALTHMON_MOUNT: + __entry->mask = event->flags; + break; + } + ), + TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x group 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XFS_HEALTHMON_TYPE_STRINGS), + __print_symbolic(__entry->domain, XFS_HEALTHMON_DOMAIN_STRINGS), + __entry->mask, + __entry->ino, + __entry->gen, + __entry->group) +); +#define DEFINE_HEALTHMONEVENT_EVENT(name) \ +DEFINE_EVENT(xfs_healthmon_event_class, name, \ + TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), \ + TP_ARGS(mp, event)) +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_push); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_pop); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format_overflow); +DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_drop); +#endif /* CONFIG_XFS_HEALTH_MONITOR */ + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH From patchwork Tue Dec 31 23:41:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924045 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98B7E13FD72 for ; Tue, 31 Dec 2024 23:41:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688488; cv=none; b=EJwkxW62vWOKsrHHLNaoyH/5vc4DqeAuc7K4+l6Yq6Vb7TwuN4/+xJpt8DiNjKauNYfDNCdNPMR3R0uAftvIO3diCzt4G7mzNv2JMH/iVeLLqtMyeRwDWdSb3izNH7HndV1r4J7mz57w0KAWHN5fVcsQsCW4yTOvkeDWL9IrszM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688488; c=relaxed/simple; bh=jWRoo4YJPb9GQjPYb2d7eogKn+t7azdTmdKq4uwoc4o=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MiiA/tBMLQrleh7u6AKl4TlaKMmzK0ob8+yh8XyEICefKyMrVWG8s2YbcXt7j52H33ILABS0RguEEio2CS7eKiA1vgFhXhaZ+pZ6KtA1z4gRD/c29KEy49OfbZz2KJFrt6wwt+tybA/rJ6r/IkyOJ0KYuGDoeTY+M6tibExtBaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=q/3CCcVm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="q/3CCcVm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13234C4CED2; Tue, 31 Dec 2024 23:41:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688488; bh=jWRoo4YJPb9GQjPYb2d7eogKn+t7azdTmdKq4uwoc4o=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=q/3CCcVm8UolLFq4ysSwHSPGVScoQ3eYzQSvOvaztkXO2jKZdQXtOqT0Ys+X3mSmH N7eUn6vTaA+ORsIal4hil5+uHJhIFfDpcMJagH49IsqMaBSVMohVmRBOsLDbEmqflf pzabBo64rRm1G4P+CUpTr4bbia1qddaEKesklEhHFwW/2QObEqZ1s7yOuf/fzOHUOb 4UMxAowTUzYm39fjeVvjH1FvlXiER6GtB2oMrJw4z+pUs6m0Hw2rpefaSKd7/P/N6U 1Seie1qLW5QubTc1ftp0xH+2/SrGhsfXwt+QkoskpuHGggZ7tXr5II+RoDptcTf/Ik MIOMOaxU9no9A== Date: Tue, 31 Dec 2024 15:41:27 -0800 Subject: [PATCH 10/16] xfs: report metadata health events through healthmon From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754916.2704911.15467242100626942628.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Set up a metadata health event hook so that we can send events to userspace as we collect information. The unmount hook severs the weak reference between the health monitor and the filesystem it's monitoring; when this happens, we stop reporting events because there's no longer any point. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_healthmon.schema.json | 328 ++++++++++++++++++++++++++ fs/xfs/xfs_healthmon.c | 397 +++++++++++++++++++++++++++++++ fs/xfs/xfs_healthmon.h | 30 ++ fs/xfs/xfs_trace.h | 97 +++++++- 4 files changed, 846 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index 9772efe25f193d..154ea0228a3615 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -18,6 +18,18 @@ "oneOf": [ { "$ref": "#/$events/lost" + }, + { + "$ref": "#/$events/fs_metadata" + }, + { + "$ref": "#/$events/rtgroup_metadata" + }, + { + "$ref": "#/$events/perag_metadata" + }, + { + "$ref": "#/$events/inode_metadata" } ], @@ -27,6 +39,169 @@ "title": "Time of Event", "description": "Timestamp of the event, in nanoseconds since the Unix epoch.", "type": "integer" + }, + "xfs_agnumber_t": { + "description": "Allocation group number", + "type": "integer", + "minimum": 0, + "maximum": 2147483647 + }, + "xfs_rgnumber_t": { + "description": "Realtime allocation group number", + "type": "integer", + "minimum": 0, + "maximum": 2147483647 + }, + "xfs_ino_t": { + "description": "Inode number", + "type": "integer", + "minimum": 1 + }, + "i_generation": { + "description": "Inode generation number", + "type": "integer" + } + }, + + "$comment": "Filesystem metadata event data are defined here.", + "$metadata": { + "status": { + "description": "Metadata health status", + "$comment": [ + "One of:", + "", + " * sick: metadata corruption discovered", + " during a runtime operation.", + " * corrupt: corruption discovered during", + " an xfs_scrub run.", + " * healthy: metadata object was found to be", + " ok by xfs_scrub." + ], + "enum": [ + "sick", + "corrupt", + "healthy" + ] + }, + "fs": { + "description": [ + "Metadata structures that affect the entire", + "filesystem. Options include:", + "", + " * fscounters: summary counters", + " * usrquota: user quota records", + " * grpquota: group quota records", + " * prjquota: project quota records", + " * quotacheck: quota counters", + " * nlinks: file link counts", + " * metadir: metadata directory", + " * metapath: metadata inode paths" + ], + "enum": [ + "fscounters", + "grpquota", + "metadir", + "metapath", + "nlinks", + "prjquota", + "quotacheck", + "usrquota" + ] + }, + "perag": { + "description": [ + "Metadata structures owned by allocation", + "groups on the data device. Options include:", + "", + " * agf: group space header", + " * agfl: per-group free block list", + " * agi: group inode header", + " * bnobt: free space by position btree", + " * cntbt: free space by length btree", + " * finobt: free inode btree", + " * inobt: inode btree", + " * rmapbt: reverse mapping btree", + " * refcountbt: reference count btree", + " * inodes: problems were recorded for", + " this group's inodes, but the", + " inodes themselves had to be", + " reclaimed.", + " * super: superblock" + ], + "enum": [ + "agf", + "agfl", + "agi", + "bnobt", + "cntbt", + "finobt", + "inobt", + "inodes", + "refcountbt", + "rmapbt", + "super" + ] + }, + "rtgroup": { + "description": [ + "Metadata structures owned by allocation", + "groups on the realtime volume. Options", + "include:", + "", + " * bitmap: free space bitmap contents", + " for this group", + " * summary: realtime free space summary file", + " * rmapbt: reverse mapping btree", + " * refcountbt: reference count btree", + " * super: group superblock" + ], + "enum": [ + "bitmap", + "summary", + "refcountbt", + "rmapbt", + "super" + ] + }, + "inode": { + "description": [ + "Metadata structures owned by file inodes.", + "Options include:", + "", + " * bmapbta: attr fork", + " * bmapbtc: cow fork", + " * bmapbtd: data fork", + " * core: inode record", + " * directory: directory entries", + " * dirtree: directory tree problems detected", + " * parent: directory parent pointer", + " * symlink: symbolic link target", + " * xattr: extended attributes", + "", + "These are set when an inode record repair had", + "to drop the corresponding data structure to", + "get the inode back to a consistent state.", + "", + " * bmapbtd_zapped", + " * bmapbta_zapped", + " * directory_zapped", + " * symlink_zapped" + ], + "enum": [ + "bmapbta", + "bmapbta_zapped", + "bmapbtc", + "bmapbtd", + "bmapbtd_zapped", + "core", + "directory", + "directory_zapped", + "dirtree", + "parent", + "symlink", + "symlink_zapped", + "xattr" + ] } }, @@ -58,6 +233,159 @@ "time_ns", "domain" ] + }, + "fs_metadata": { + "title": "Filesystem-wide metadata event", + "description": [ + "Health status updates for filesystem-wide", + "metadata objects." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "fs" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/fs" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "structures" + ] + }, + "perag_metadata": { + "title": "Data device allocation group metadata event", + "description": [ + "Health status updates for data device ", + "allocation group metadata." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "perag" + }, + "group": { + "$ref": "#/$defs/xfs_agnumber_t" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/perag" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "group", + "structures" + ] + }, + "rtgroup_metadata": { + "title": "Realtime allocation group metadata event", + "description": [ + "Health status updates for realtime allocation", + "group metadata." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "rtgroup" + }, + "group": { + "$ref": "#/$defs/xfs_rgnumber_t" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/rtgroup" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "group", + "structures" + ] + }, + "inode_metadata": { + "title": "Inode metadata event", + "description": [ + "Health status updates for inode metadata.", + "The inode and generation number describe the", + "file that is affected by the change." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$metadata/status" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "inode" + }, + "inumber": { + "$ref": "#/$defs/xfs_ino_t" + }, + "generation": { + "$ref": "#/$defs/i_generation" + }, + "structures": { + "type": "array", + "items": { + "$ref": "#/$metadata/inode" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "inumber", + "generation", + "structures" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 499f6aab9bdbf3..9d34a826726e3e 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -18,6 +18,7 @@ #include "xfs_da_btree.h" #include "xfs_quota_defs.h" #include "xfs_rtgroup.h" +#include "xfs_health.h" #include "xfs_healthmon.h" #include @@ -65,8 +66,15 @@ struct xfs_healthmon { struct xfs_healthmon_event *first_event; struct xfs_healthmon_event *last_event; + /* live update hooks */ + struct xfs_health_hook hhook; + + /* filesystem mount, or NULL if we've unmounted */ struct xfs_mount *mp; + /* filesystem type for safe cleanup of hooks; requires module_get */ + struct file_system_type *fstyp; + /* number of events */ unsigned int events; @@ -178,6 +186,10 @@ xfs_healthmon_start_live_update( { struct xfs_healthmon_event *event; + /* Already unmounted filesystem, do nothing. */ + if (!hm->mp) + return -ESHUTDOWN; + /* * If we previously lost an event or the queue is full, try to queue * a notification about lost events. @@ -207,6 +219,171 @@ xfs_healthmon_start_live_update( return 0; } +/* Compute the reporting mask. */ +static inline bool +xfs_healthmon_event_mask( + struct xfs_healthmon *hm, + enum xfs_health_update_type type, + const struct xfs_health_update_params *hup, + unsigned int *mask) +{ + /* Always report unmounts. */ + if (type == XFS_HEALTHUP_UNMOUNT) + return true; + + /* If we want all events, return all events. */ + if (hm->verbose) { + *mask = hup->new_mask; + return true; + } + + switch (type) { + case XFS_HEALTHUP_SICK: + /* Always report runtime corruptions */ + *mask = hup->new_mask; + break; + case XFS_HEALTHUP_CORRUPT: + /* Only report new fsck errors */ + *mask = hup->new_mask & ~hup->old_mask; + break; + case XFS_HEALTHUP_HEALTHY: + /* Only report healthy metadata that got fixed */ + *mask = hup->new_mask & hup->old_mask; + break; + case XFS_HEALTHUP_UNMOUNT: + /* This is here for static enum checking */ + break; + } + + /* If not in verbose mode, mask state has to change. */ + return *mask != 0; +} + +static inline enum xfs_healthmon_type +health_update_to_type( + enum xfs_health_update_type type) +{ + switch (type) { + case XFS_HEALTHUP_SICK: + return XFS_HEALTHMON_SICK; + case XFS_HEALTHUP_CORRUPT: + return XFS_HEALTHMON_CORRUPT; + case XFS_HEALTHUP_HEALTHY: + return XFS_HEALTHMON_HEALTHY; + case XFS_HEALTHUP_UNMOUNT: + /* static checking */ + break; + } + return XFS_HEALTHMON_UNMOUNT; +} + +static inline enum xfs_healthmon_domain +health_update_to_domain( + enum xfs_health_update_domain domain) +{ + switch (domain) { + case XFS_HEALTHUP_FS: + return XFS_HEALTHMON_FS; + case XFS_HEALTHUP_AG: + return XFS_HEALTHMON_AG; + case XFS_HEALTHUP_RTGROUP: + return XFS_HEALTHMON_RTGROUP; + case XFS_HEALTHUP_INODE: + /* static checking */ + break; + } + return XFS_HEALTHMON_INODE; +} + +/* Add a health event to the reporting queue. */ +STATIC int +xfs_healthmon_metadata_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_health_update_params *hup = data; + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + enum xfs_health_update_type type = action; + unsigned int mask = 0; + int error; + + hm = container_of(nb, struct xfs_healthmon, hhook.health_hook.nb); + + /* Decode event mask and skip events we don't care about. */ + if (!xfs_healthmon_event_mask(hm, type, hup, &mask)) + return NOTIFY_DONE; + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_metadata_hook(hm->mp, action, hup, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + if (type == XFS_HEALTHUP_UNMOUNT) { + /* + * The filesystem is unmounting, so we must detach from the + * mount. After this point, the healthmon thread has no + * connection to the mounted filesystem. + */ + trace_xfs_healthmon_unmount(hm->mp, hm->events, + hm->lost_prev_event); + hm->mp = NULL; + wake_up(&hm->wait); + goto out_unlock; + } + + event = xfs_healthmon_alloc(hm, health_update_to_type(type), + health_update_to_domain(hup->domain)); + if (!event) + goto out_unlock; + + /* Ignore the event if it's only reporting a secondary health state. */ + switch (event->domain) { + case XFS_HEALTHMON_FS: + event->fsmask = mask & ~XFS_SICK_FS_SECONDARY; + if (!event->fsmask) + goto out_event; + break; + case XFS_HEALTHMON_AG: + event->grpmask = mask & ~XFS_SICK_AG_SECONDARY; + if (!event->grpmask) + goto out_event; + event->group = hup->group; + break; + case XFS_HEALTHMON_RTGROUP: + event->grpmask = mask & ~XFS_SICK_RG_SECONDARY; + if (!event->grpmask) + goto out_event; + event->group = hup->group; + break; + case XFS_HEALTHMON_INODE: + event->imask = mask & ~XFS_SICK_INO_SECONDARY; + if (!event->imask) + goto out_event; + event->ino = hup->ino; + event->gen = hup->gen; + break; + default: + ASSERT(0); + break; + } + error = xfs_healthmon_push(hm, event); + if (error) + goto out_event; + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +out_event: + kfree(event); + goto out_unlock; +} + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -214,6 +391,10 @@ xfs_healthmon_typestring( { static const char *type_strings[] = { [XFS_HEALTHMON_LOST] = "lost", + [XFS_HEALTHMON_UNMOUNT] = "unmount", + [XFS_HEALTHMON_SICK] = "sick", + [XFS_HEALTHMON_CORRUPT] = "corrupt", + [XFS_HEALTHMON_HEALTHY] = "healthy", }; if (event->type >= ARRAY_SIZE(type_strings)) @@ -229,6 +410,10 @@ xfs_healthmon_domstring( { static const char *dom_strings[] = { [XFS_HEALTHMON_MOUNT] = "mount", + [XFS_HEALTHMON_FS] = "fs", + [XFS_HEALTHMON_AG] = "perag", + [XFS_HEALTHMON_INODE] = "inode", + [XFS_HEALTHMON_RTGROUP] = "rtgroup", }; if (event->domain >= ARRAY_SIZE(dom_strings)) @@ -254,6 +439,11 @@ xfs_healthmon_format_flags( if (!(p->mask & flags)) continue; + if (!p->str) { + flags &= ~p->mask; + continue; + } + ret = seq_buf_printf(outbuf, "%s\"%s\"", first ? "" : ", ", p->str); if (ret < 0) @@ -304,6 +494,118 @@ __xfs_healthmon_format_mask( #define xfs_healthmon_format_mask(o, d, s, m) \ __xfs_healthmon_format_mask((o), (d), (s), ARRAY_SIZE(s), (m)) +/* Render fs sickness mask as a string set */ +static int +xfs_healthmon_format_fs( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_SICK_FS_COUNTERS, "fscounters" }, + { XFS_SICK_FS_UQUOTA, "usrquota" }, + { XFS_SICK_FS_GQUOTA, "grpquota" }, + { XFS_SICK_FS_PQUOTA, "prjquota" }, + { XFS_SICK_FS_QUOTACHECK, "quotacheck" }, + { XFS_SICK_FS_NLINKS, "nlinks" }, + { XFS_SICK_FS_METADIR, "metadir" }, + { XFS_SICK_FS_METAPATH, "metapath" }, + }; + + return xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + event->fsmask); +} + +/* Render rtgroup sickness mask as a string set */ +static int +xfs_healthmon_format_rtgroup( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_SICK_RG_SUPER, "super" }, + { XFS_SICK_RG_BITMAP, "bitmap" }, + { XFS_SICK_RG_SUMMARY, "summary" }, + { XFS_SICK_RG_RMAPBT, "rmapbt" }, + { XFS_SICK_RG_REFCNTBT, "refcountbt" }, + }; + ssize_t ret; + + ret = xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + event->grpmask); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"group\": %u,\n", + event->group); +} + +/* Render perag sickness mask as a string set */ +static int +xfs_healthmon_format_ag( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_SICK_AG_SB, "super" }, + { XFS_SICK_AG_AGF, "agf" }, + { XFS_SICK_AG_AGFL, "agfl" }, + { XFS_SICK_AG_AGI, "agi" }, + { XFS_SICK_AG_BNOBT, "bnobt" }, + { XFS_SICK_AG_CNTBT, "cntbt" }, + { XFS_SICK_AG_INOBT, "inobt" }, + { XFS_SICK_AG_FINOBT, "finobt" }, + { XFS_SICK_AG_RMAPBT, "rmapbt" }, + { XFS_SICK_AG_REFCNTBT, "refcountbt" }, + { XFS_SICK_AG_INODES, "inodes" }, + }; + ssize_t ret; + + ret = xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + event->grpmask); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"group\": %u,\n", + event->group); +} + +/* Render inode sickness mask as a string set */ +static int +xfs_healthmon_format_inode( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { XFS_SICK_INO_CORE, "core" }, + { XFS_SICK_INO_BMBTD, "bmapbtd" }, + { XFS_SICK_INO_BMBTA, "bmapbta" }, + { XFS_SICK_INO_BMBTC, "bmapbtc" }, + { XFS_SICK_INO_DIR, "directory" }, + { XFS_SICK_INO_XATTR, "xattr" }, + { XFS_SICK_INO_SYMLINK, "symlink" }, + { XFS_SICK_INO_PARENT, "parent" }, + { XFS_SICK_INO_BMBTD_ZAPPED, "bmapbtd_zapped" }, + { XFS_SICK_INO_BMBTA_ZAPPED, "bmapbta_zapped" }, + { XFS_SICK_INO_DIR_ZAPPED, "directory_zapped" }, + { XFS_SICK_INO_SYMLINK_ZAPPED, "symlink_zapped" }, + { XFS_SICK_INO_FORGET, NULL, }, + { XFS_SICK_INO_DIRTREE, "dirtree" }, + }; + ssize_t ret; + + ret = xfs_healthmon_format_mask(outbuf, "structures", mask_strings, + event->imask); + if (ret < 0) + return ret; + + ret = seq_buf_printf(outbuf, " \"inumber\": %llu,\n", + event->ino); + if (ret < 0) + return ret; + return seq_buf_printf(outbuf, " \"generation\": %u,\n", + event->gen); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -354,6 +656,18 @@ xfs_healthmon_format( case XFS_HEALTHMON_MOUNT: /* empty */ break; + case XFS_HEALTHMON_FS: + ret = xfs_healthmon_format_fs(outbuf, event); + break; + case XFS_HEALTHMON_RTGROUP: + ret = xfs_healthmon_format_rtgroup(outbuf, event); + break; + case XFS_HEALTHMON_AG: + ret = xfs_healthmon_format_ag(outbuf, event); + break; + case XFS_HEALTHMON_INODE: + ret = xfs_healthmon_format_inode(outbuf, event); + break; } if (ret < 0) goto overrun; @@ -400,7 +714,7 @@ static inline bool xfs_healthmon_has_eventdata( struct xfs_healthmon *hm) { - return hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; + return !hm->mp || hm->events > 0 || xfs_healthmon_outbuf_bytes(hm) > 0; } /* Try to copy the rest of the outbuf to the iov iter. */ @@ -521,6 +835,7 @@ xfs_healthmon_read_iter( break; xfs_healthmon_free_head(hm, event); } + /* Copy it to userspace */ ret = xfs_healthmon_copybuf(hm, to); if (ret <= 0) @@ -568,6 +883,58 @@ xfs_healthmon_free_events( hm->first_event = hm->last_event = NULL; } +/* + * Detach all filesystem hooks that were set up for a health monitor. Only + * call this from iterate_super*. + */ +STATIC void +xfs_healthmon_detach_hooks( + struct super_block *sb, + void *arg) +{ + struct xfs_healthmon *hm = arg; + + mutex_lock(&hm->lock); + + /* + * Because health monitors have a weak reference to the filesystem + * they're monitoring, the hook deletions below must not race against + * that filesystem being unmounted because that could lead to UAF + * errors. + * + * If hm->mp is NULL, the health unmount hook already ran and the hook + * chain head (contained within the xfs_mount structure) is gone. Do + * not detach any hooks; just let them get freed when the healthmon + * object is torn down. + */ + if (!hm->mp) + goto out_unlock; + + /* + * Otherwise, the caller gave us a non-dying @sb with s_umount held in + * shared mode, which means that @sb cannot be running through + * deactivate_locked_super and cannot be freed. It's safe to compare + * @sb against the super that we snapshotted when we set up the health + * monitor. + */ + if (hm->mp->m_super != sb) + goto out_unlock; + + mutex_unlock(&hm->lock); + + /* + * Now we know that the filesystem @hm->mp is active and cannot be + * deactivated until this function returns. Unmount events are sent + * through the health monitoring subsystem from xfs_fs_put_super, so + * it is now time to detach the hooks. + */ + xfs_health_hook_del(hm->mp, &hm->hhook); + return; + +out_unlock: + mutex_unlock(&hm->lock); +} + /* Free the health monitoring information. */ STATIC int xfs_healthmon_release( @@ -580,6 +947,9 @@ xfs_healthmon_release( wake_up_all(&hm->wait); + iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_health_hook_disable(); + mutex_destroy(&hm->lock); xfs_healthmon_free_events(hm); if (hm->outbuf.size) @@ -641,6 +1011,13 @@ xfs_ioc_health_monitor( return -ENOMEM; hm->mp = mp; + /* + * Since we already got a ref to the module, take a reference to the + * fstype to make it easier to detach the hooks when we tear things + * down later. + */ + hm->fstyp = mp->m_super->s_type; + seq_buf_init(&hm->outbuf, NULL, 0); mutex_init(&hm->lock); init_waitqueue_head(&hm->wait); @@ -648,11 +1025,20 @@ xfs_ioc_health_monitor( if (hmo.flags & XFS_HEALTH_MONITOR_VERBOSE) hm->verbose = true; + /* Enable hooks to receive events, generally. */ + xfs_health_hook_enable(); + + /* Attach specific event hooks to this monitor. */ + xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); + ret = xfs_health_hook_add(mp, &hm->hhook); + if (ret) + goto out_hooks; + /* Set up VFS file and file descriptor. */ name = kasprintf(GFP_KERNEL, "XFS (%s): healthmon", mp->m_super->s_id); if (!name) { ret = -ENOMEM; - goto out_mutex; + goto out_healthhook; } fd = anon_inode_getfd(name, &xfs_healthmon_fops, hm, @@ -660,14 +1046,17 @@ xfs_ioc_health_monitor( kvfree(name); if (fd < 0) { ret = fd; - goto out_mutex; + goto out_healthhook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; -out_mutex: +out_healthhook: + xfs_health_hook_del(mp, &hm->hhook); +out_hooks: + xfs_health_hook_disable(); mutex_destroy(&hm->lock); xfs_healthmon_free_events(hm); kfree(hm); diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 606f205074495c..3ece61165837b2 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -8,10 +8,22 @@ enum xfs_healthmon_type { XFS_HEALTHMON_LOST, /* message lost */ + + /* metadata health events */ + XFS_HEALTHMON_SICK, /* runtime corruption observed */ + XFS_HEALTHMON_CORRUPT, /* fsck reported corruption */ + XFS_HEALTHMON_HEALTHY, /* fsck reported healthy structure */ + XFS_HEALTHMON_UNMOUNT, /* filesystem is unmounting */ }; enum xfs_healthmon_domain { XFS_HEALTHMON_MOUNT, /* affects the whole fs */ + + /* metadata health events */ + XFS_HEALTHMON_FS, /* main filesystem metadata */ + XFS_HEALTHMON_AG, /* allocation group metadata */ + XFS_HEALTHMON_INODE, /* inode metadata */ + XFS_HEALTHMON_RTGROUP, /* realtime group metadata */ }; struct xfs_healthmon_event { @@ -27,6 +39,24 @@ struct xfs_healthmon_event { struct { unsigned int flags; }; + /* fs/rt metadata */ + struct { + /* XFS_SICK_* flags */ + unsigned int fsmask; + }; + /* ag/rtgroup metadata */ + struct { + /* XFS_SICK_* flags */ + unsigned int grpmask; + unsigned int group; + }; + /* inode metadata */ + struct { + /* XFS_SICK_INO_* flags */ + unsigned int imask; + uint32_t gen; + xfs_ino_t ino; + }; }; }; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index bd3b007d213fc6..4a68d2ec8d0a34 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -6174,14 +6174,30 @@ DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); #define XFS_HEALTHMON_TYPE_STRINGS \ - { XFS_HEALTHMON_LOST, "lost" } + { XFS_HEALTHMON_LOST, "lost" }, \ + { XFS_HEALTHMON_UNMOUNT, "unmount" }, \ + { XFS_HEALTHMON_SICK, "sick" }, \ + { XFS_HEALTHMON_CORRUPT, "corrupt" }, \ + { XFS_HEALTHMON_HEALTHY, "healthy" } #define XFS_HEALTHMON_DOMAIN_STRINGS \ - { XFS_HEALTHMON_MOUNT, "mount" } + { XFS_HEALTHMON_MOUNT, "mount" }, \ + { XFS_HEALTHMON_FS, "fs" }, \ + { XFS_HEALTHMON_AG, "ag" }, \ + { XFS_HEALTHMON_INODE, "inode" }, \ + { XFS_HEALTHMON_RTGROUP, "rtgroup" } TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_UNMOUNT); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_SICK); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_CORRUPT); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_HEALTHY); TRACE_DEFINE_ENUM(XFS_HEALTHMON_MOUNT); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_FS); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_AG); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_INODE); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_RTGROUP); DECLARE_EVENT_CLASS(xfs_healthmon_event_class, TP_PROTO(const struct xfs_mount *mp, const struct xfs_healthmon_event *event), @@ -6207,6 +6223,19 @@ DECLARE_EVENT_CLASS(xfs_healthmon_event_class, case XFS_HEALTHMON_MOUNT: __entry->mask = event->flags; break; + case XFS_HEALTHMON_FS: + __entry->mask = event->fsmask; + break; + case XFS_HEALTHMON_AG: + case XFS_HEALTHMON_RTGROUP: + __entry->mask = event->grpmask; + __entry->group = event->group; + break; + case XFS_HEALTHMON_INODE: + __entry->mask = event->imask; + __entry->ino = event->ino; + __entry->gen = event->gen; + break; } ), TP_printk("dev %d:%d type %s domain %s mask 0x%x ino 0x%llx gen 0x%x group 0x%x", @@ -6227,6 +6256,70 @@ DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_pop); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_format_overflow); DEFINE_HEALTHMONEVENT_EVENT(xfs_healthmon_drop); + +#define XFS_HEALTHUP_TYPE_STRINGS \ + { XFS_HEALTHUP_UNMOUNT, "unmount" }, \ + { XFS_HEALTHUP_SICK, "sick" }, \ + { XFS_HEALTHUP_CORRUPT, "corrupt" }, \ + { XFS_HEALTHUP_HEALTHY, "healthy" } + +#define XFS_HEALTHUP_DOMAIN_STRINGS \ + { XFS_HEALTHUP_FS, "fs" }, \ + { XFS_HEALTHUP_AG, "ag" }, \ + { XFS_HEALTHUP_INODE, "inode" }, \ + { XFS_HEALTHUP_RTGROUP, "rtgroup" } + +TRACE_DEFINE_ENUM(XFS_HEALTHUP_UNMOUNT); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_SICK); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_CORRUPT); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_HEALTHY); + +TRACE_DEFINE_ENUM(XFS_HEALTHUP_FS); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_AG); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_INODE); +TRACE_DEFINE_ENUM(XFS_HEALTHUP_RTGROUP); + +TRACE_EVENT(xfs_healthmon_metadata_hook, + TP_PROTO(const struct xfs_mount *mp, unsigned long type, + const struct xfs_health_update_params *update, + unsigned int events, bool lost_prev), + TP_ARGS(mp, type, update, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned long, type) + __field(unsigned int, domain) + __field(unsigned int, old_mask) + __field(unsigned int, new_mask) + __field(unsigned long long, ino) + __field(unsigned int, gen) + __field(unsigned int, group) + __field(unsigned int, events) + __field(bool, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->type = type; + __entry->domain = update->domain; + __entry->old_mask = update->old_mask; + __entry->new_mask = update->new_mask; + __entry->ino = update->ino; + __entry->gen = update->gen; + __entry->group = update->group; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d type %s domain %s oldmask 0x%x newmask 0x%x ino 0x%llx gen 0x%x group 0x%x events %u lost_prev? %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_symbolic(__entry->type, XFS_HEALTHUP_TYPE_STRINGS), + __print_symbolic(__entry->domain, XFS_HEALTHUP_DOMAIN_STRINGS), + __entry->old_mask, + __entry->new_mask, + __entry->ino, + __entry->gen, + __entry->group, + __entry->events, + __entry->lost_prev) +); #endif /* CONFIG_XFS_HEALTH_MONITOR */ #endif /* _TRACE_XFS_H */ From patchwork Tue Dec 31 23:41:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924046 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC52D1B0414 for ; Tue, 31 Dec 2024 23:41:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688503; cv=none; b=GFRS5nHFiHm5RC6Quyj834IPBT7NL9iOK9sJfvJF1lWMr+1tGGGaJ95QmUXoJVnp0+jrwt08yqaTa0soTIHYo/AYaW2rOgkBeMI1Qy9PA1esks9KToO/kimuaz50B9itpmj4OvPS+g1KcD65WsDYhx78/Bl7I1mGuphYzMEDTSU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688503; c=relaxed/simple; bh=ELxk2YxRVoSW3pMuU+f3da524OYSN/ks0fgaJ7PN3os=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XjSoM31bbqWBBpj/xPLZFo+v+ajlFSflRZ7ouUfYkm/oTPqYzoeOQ0IFEnaVfqSONrCnrBC93wni3GK2G11dzUqhHxats8tBPpn/qk3m9DdNMAx0jn1h7OloagQDHcETykRcb1u/Bf7ZCJXVncCH2KAbHKuElsWQqXXJCCodOkA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OKzHyGLn; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OKzHyGLn" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B170DC4CED2; Tue, 31 Dec 2024 23:41:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688503; bh=ELxk2YxRVoSW3pMuU+f3da524OYSN/ks0fgaJ7PN3os=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=OKzHyGLntOKW1mZ09mfsCdkx8nYGvGt9KyQtZQNPczTO7yQLcs0YQbqM93Vrjw0Qp 47ngiz248juHkHRL6hvui7W4jddk30INZ9o2RWaxNZyt28Cesb8SokxFZuIam4Ij7Y 2Trq4zbKXgBPaA1GUb46EG0g9/yG1SVKPvAjwdSY22v5npY7CvQhK0xjBVLZIw9Rak 5O/N7jPe3Dvnzjr9HGgJ6VNRTlYfSMhSgAMTGT1sAZQ/I80sRFK6wcDyIumAg5QUPk E+Asd6vxOjWCQr81sReia7S9BYveVd7+RECgiT4OWQ1KugTCiWg/SIFmTsAFJKXZtn bgk63VniyvEuA== Date: Tue, 31 Dec 2024 15:41:43 -0800 Subject: [PATCH 11/16] xfs: report shutdown events through healthmon From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754933.2704911.15047923403601596285.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Set up a shutdown hook so that we can send notifications to userspace. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_healthmon.schema.json | 62 +++++++++++++++++++++++++ fs/xfs/xfs_healthmon.c | 77 ++++++++++++++++++++++++++++++- fs/xfs/xfs_healthmon.h | 3 + fs/xfs/xfs_trace.h | 25 ++++++++++ 4 files changed, 165 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index 154ea0228a3615..a8bc75b0b8c4f9 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -30,6 +30,9 @@ }, { "$ref": "#/$events/inode_metadata" + }, + { + "$ref": "#/$events/shutdown" } ], @@ -205,6 +208,31 @@ } }, + "$comment": "Shutdown event data are defined here.", + "$shutdown": { + "reason": { + "description": [ + "Reason for a filesystem to shut down.", + "Options include:", + "", + " * corrupt_incore: in-memory corruption", + " * corrupt_ondisk: on-disk corruption", + " * device_removed: device removed", + " * force_umount: userspace asked for it", + " * log_ioerr: log write IO error", + " * meta_ioerr: metadata writeback IO error" + ], + "enum": [ + "corrupt_incore", + "corrupt_ondisk", + "device_removed", + "force_umount", + "log_ioerr", + "meta_ioerr" + ] + } + }, + "$comment": "Event types are defined here.", "$events": { "lost": { @@ -386,6 +414,40 @@ "generation", "structures" ] + }, + "shutdown": { + "title": "Abnormal Shutdown Event", + "description": [ + "The filesystem went offline due to", + "unrecoverable errors." + ], + "type": "object", + + "properties": { + "type": { + "const": "shutdown" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "mount" + }, + "reasons": { + "type": "array", + "items": { + "$ref": "#/$shutdown/reason" + }, + "minItems": 1 + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "reasons" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 9d34a826726e3e..c7df6dad5612f8 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -20,6 +20,7 @@ #include "xfs_rtgroup.h" #include "xfs_health.h" #include "xfs_healthmon.h" +#include "xfs_fsops.h" #include #include @@ -67,6 +68,7 @@ struct xfs_healthmon { struct xfs_healthmon_event *last_event; /* live update hooks */ + struct xfs_shutdown_hook shook; struct xfs_health_hook hhook; /* filesystem mount, or NULL if we've unmounted */ @@ -384,6 +386,43 @@ xfs_healthmon_metadata_hook( goto out_unlock; } +/* Add a shutdown event to the reporting queue. */ +STATIC int +xfs_healthmon_shutdown_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + int error; + + hm = container_of(nb, struct xfs_healthmon, shook.shutdown_hook.nb); + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_shutdown_hook(hm->mp, action, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_SHUTDOWN, + XFS_HEALTHMON_MOUNT); + if (!event) + goto out_unlock; + + event->flags = action; + error = xfs_healthmon_push(hm, event); + if (error) + kfree(event); + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +} + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -391,6 +430,7 @@ xfs_healthmon_typestring( { static const char *type_strings[] = { [XFS_HEALTHMON_LOST] = "lost", + [XFS_HEALTHMON_SHUTDOWN] = "shutdown", [XFS_HEALTHMON_UNMOUNT] = "unmount", [XFS_HEALTHMON_SICK] = "sick", [XFS_HEALTHMON_CORRUPT] = "corrupt", @@ -606,6 +646,25 @@ xfs_healthmon_format_inode( event->gen); } +/* Render shutdown mask as a string set */ +static int +xfs_healthmon_format_shutdown( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + static const struct flag_string mask_strings[] = { + { SHUTDOWN_META_IO_ERROR, "meta_ioerr" }, + { SHUTDOWN_LOG_IO_ERROR, "log_ioerr" }, + { SHUTDOWN_FORCE_UMOUNT, "force_umount" }, + { SHUTDOWN_CORRUPT_INCORE, "corrupt_incore" }, + { SHUTDOWN_CORRUPT_ONDISK, "corrupt_ondisk" }, + { SHUTDOWN_DEVICE_REMOVED, "device_removed" }, + }; + + return xfs_healthmon_format_mask(outbuf, "reasons", mask_strings, + event->flags); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -645,6 +704,9 @@ xfs_healthmon_format( goto overrun; switch (event->type) { + case XFS_HEALTHMON_SHUTDOWN: + ret = xfs_healthmon_format_shutdown(outbuf, event); + break; case XFS_HEALTHMON_LOST: /* empty */ break; @@ -928,6 +990,7 @@ xfs_healthmon_detach_hooks( * through the health monitoring subsystem from xfs_fs_put_super, so * it is now time to detach the hooks. */ + xfs_shutdown_hook_del(hm->mp, &hm->shook); xfs_health_hook_del(hm->mp, &hm->hhook); return; @@ -948,6 +1011,7 @@ xfs_healthmon_release( wake_up_all(&hm->wait); iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_shutdown_hook_disable(); xfs_health_hook_disable(); mutex_destroy(&hm->lock); @@ -1027,6 +1091,7 @@ xfs_ioc_health_monitor( /* Enable hooks to receive events, generally. */ xfs_health_hook_enable(); + xfs_shutdown_hook_enable(); /* Attach specific event hooks to this monitor. */ xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); @@ -1034,11 +1099,16 @@ xfs_ioc_health_monitor( if (ret) goto out_hooks; + xfs_shutdown_hook_setup(&hm->shook, xfs_healthmon_shutdown_hook); + ret = xfs_shutdown_hook_add(mp, &hm->shook); + if (ret) + goto out_healthhook; + /* Set up VFS file and file descriptor. */ name = kasprintf(GFP_KERNEL, "XFS (%s): healthmon", mp->m_super->s_id); if (!name) { ret = -ENOMEM; - goto out_healthhook; + goto out_shutdownhook; } fd = anon_inode_getfd(name, &xfs_healthmon_fops, hm, @@ -1046,17 +1116,20 @@ xfs_ioc_health_monitor( kvfree(name); if (fd < 0) { ret = fd; - goto out_healthhook; + goto out_shutdownhook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; +out_shutdownhook: + xfs_shutdown_hook_del(mp, &hm->shook); out_healthhook: xfs_health_hook_del(mp, &hm->hhook); out_hooks: xfs_health_hook_disable(); + xfs_shutdown_hook_disable(); mutex_destroy(&hm->lock); xfs_healthmon_free_events(hm); kfree(hm); diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 3ece61165837b2..a7b2eaf3dd64e1 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -9,6 +9,9 @@ enum xfs_healthmon_type { XFS_HEALTHMON_LOST, /* message lost */ + /* filesystem shutdown */ + XFS_HEALTHMON_SHUTDOWN, + /* metadata health events */ XFS_HEALTHMON_SICK, /* runtime corruption observed */ XFS_HEALTHMON_CORRUPT, /* fsck reported corruption */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 4a68d2ec8d0a34..404b857db39d0d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -6173,8 +6173,32 @@ DEFINE_HEALTHMON_EVENT(xfs_healthmon_read_finish); DEFINE_HEALTHMON_EVENT(xfs_healthmon_release); DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); +TRACE_EVENT(xfs_healthmon_shutdown_hook, + TP_PROTO(const struct xfs_mount *mp, uint32_t shutdown_flags, + unsigned int events, bool lost_prev), + TP_ARGS(mp, shutdown_flags, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(uint32_t, shutdown_flags) + __field(unsigned int, events) + __field(bool, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->shutdown_flags = shutdown_flags; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d shutdown_flags %s events %u lost_prev? %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __print_flags(__entry->shutdown_flags, "|", XFS_SHUTDOWN_STRINGS), + __entry->events, + __entry->lost_prev) +); + #define XFS_HEALTHMON_TYPE_STRINGS \ { XFS_HEALTHMON_LOST, "lost" }, \ + { XFS_HEALTHMON_SHUTDOWN, "shutdown" }, \ { XFS_HEALTHMON_UNMOUNT, "unmount" }, \ { XFS_HEALTHMON_SICK, "sick" }, \ { XFS_HEALTHMON_CORRUPT, "corrupt" }, \ @@ -6188,6 +6212,7 @@ DEFINE_HEALTHMON_EVENT(xfs_healthmon_unmount); { XFS_HEALTHMON_RTGROUP, "rtgroup" } TRACE_DEFINE_ENUM(XFS_HEALTHMON_LOST); +TRACE_DEFINE_ENUM(XFS_HEALTHMON_SHUTDOWN); TRACE_DEFINE_ENUM(XFS_HEALTHMON_UNMOUNT); TRACE_DEFINE_ENUM(XFS_HEALTHMON_SICK); TRACE_DEFINE_ENUM(XFS_HEALTHMON_CORRUPT); From patchwork Tue Dec 31 23:41:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924047 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7CBAC1B0418 for ; Tue, 31 Dec 2024 23:41:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688519; cv=none; b=hFECdn/V1rZ45Ri5kYFuw+B9a4hWRdYDfuBfwL/8NfV0t+pzpZRp7NW2Gcz0X3P/yVelQ8uJTeL3791D7CGjOlJ0l+utzUOmlyjSdzBWB12DeCvdz4/8T4CDbZ7NjJbOeXmA9L2eMmPvrX0TunENRhThdbPaWKLe1nEN/jkb1b8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688519; c=relaxed/simple; bh=EUsI5fZIh3c6v5EqOCiKBObcmuzZco/lZo2b+LSNX4A=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kN7GXGOHRshGGuWGHfLY//Mim7esd1wzXzVn34+akX1ojRhK6vZmsSwH0/0gbe+hol2j9E6sbUiIb2cGnRwt4UaQxNetaieKSRrosRjPPSZTWCb2i5ixqFZ64W/qhzRrRuk8Nxb2FIhWL5AUZOUGxCNBco5/LukXPGVaP6GJV0Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TUR3ONvg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TUR3ONvg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5C91CC4CED2; Tue, 31 Dec 2024 23:41:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688519; bh=EUsI5fZIh3c6v5EqOCiKBObcmuzZco/lZo2b+LSNX4A=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=TUR3ONvgT/Xi9QMOjzqRsGH4h8pXSZIl/sSq6Wqp5fhpSBQs++r3KzxS//K1SFC3b OkeZZ0da8mItEKhcZ7KFYHi/NfqpihdXELOlnYmOTlwdoqhICOB2L+15qmf8LymCJU /z1i7kWjn52u7HBmcCKg4PaE7VHaXVkdLbR9qvbqWC1TlmMQEMLSA+nceYn4jo2aVH dkjH7urkLGrM+KK6FXj+g+ROuH4Xfs5CAYOJ2ioCiIOcuY5TsETVSdHAt8MzrNxeWJ kSCHHPaWfUxvHHNxLNPxX+6hk20d6fpd/56HAPck43K31MRMW6HO2og3xlPAVW4PDN EwxFW4yPnGIGw== Date: Tue, 31 Dec 2024 15:41:58 -0800 Subject: [PATCH 12/16] xfs: report media errors through healthmon From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754951.2704911.7356371794064990039.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Now that we have hooks to report media errors, connect this to the health monitor as well. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_healthmon.schema.json | 65 +++++++++++++++++++++ fs/xfs/xfs_healthmon.c | 96 ++++++++++++++++++++++++++++++- fs/xfs/xfs_healthmon.h | 13 ++++ fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 51 ++++++++++++++++ 5 files changed, 224 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index a8bc75b0b8c4f9..006f4145faa9f5 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -33,6 +33,9 @@ }, { "$ref": "#/$events/shutdown" + }, + { + "$ref": "#/$events/media_error" } ], @@ -63,6 +66,31 @@ "i_generation": { "description": "Inode generation number", "type": "integer" + }, + "storage_devs": { + "description": "Storage devices in a filesystem", + "_comment": [ + "One of:", + "", + " * datadev: filesystem device", + " * logdev: external log device", + " * rtdev: realtime volume" + ], + "enum": [ + "datadev", + "logdev", + "rtdev" + ] + }, + "xfs_daddr_t": { + "description": "Storage device address, in units of 512-byte blocks", + "type": "integer", + "minimum": 0 + }, + "bbcount": { + "description": "Storage space length, in units of 512-byte blocks", + "type": "integer", + "minimum": 1 } }, @@ -448,6 +476,43 @@ "domain", "reasons" ] + }, + "media_error": { + "title": "Media Error", + "description": [ + "A storage device reported a media error.", + "The domain element tells us which storage", + "device reported the media failure. The", + "daddr and bbcount elements tell us where", + "inside that device the failure was observed." + ], + "type": "object", + + "properties": { + "type": { + "const": "media" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "$ref": "#/$defs/storage_devs" + }, + "daddr": { + "$ref": "#/$defs/xfs_daddr_t" + }, + "bbcount": { + "$ref": "#/$defs/bbcount" + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "daddr", + "bbcount" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index c7df6dad5612f8..c828ea7442e932 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -21,6 +21,7 @@ #include "xfs_health.h" #include "xfs_healthmon.h" #include "xfs_fsops.h" +#include "xfs_notify_failure.h" #include #include @@ -70,6 +71,7 @@ struct xfs_healthmon { /* live update hooks */ struct xfs_shutdown_hook shook; struct xfs_health_hook hhook; + struct xfs_media_error_hook mhook; /* filesystem mount, or NULL if we've unmounted */ struct xfs_mount *mp; @@ -423,6 +425,59 @@ xfs_healthmon_shutdown_hook( return NOTIFY_DONE; } +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +/* Add a media error event to the reporting queue. */ +STATIC int +xfs_healthmon_media_error_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + struct xfs_media_error_params *p = data; + enum xfs_healthmon_domain domain = 0; /* shut up gcc */ + int error; + + hm = container_of(nb, struct xfs_healthmon, mhook.error_hook.nb); + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_media_error_hook(p, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + switch (p->fdev) { + case XFS_FAILED_LOGDEV: + domain = XFS_HEALTHMON_LOGDEV; + break; + case XFS_FAILED_RTDEV: + domain = XFS_HEALTHMON_RTDEV; + break; + case XFS_FAILED_DATADEV: + domain = XFS_HEALTHMON_DATADEV; + break; + } + + event = xfs_healthmon_alloc(hm, XFS_HEALTHMON_MEDIA_ERROR, domain); + if (!event) + goto out_unlock; + + event->daddr = p->daddr; + event->bbcount = p->bbcount; + error = xfs_healthmon_push(hm, event); + if (error) + kfree(event); + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +} +#endif + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -435,6 +490,7 @@ xfs_healthmon_typestring( [XFS_HEALTHMON_SICK] = "sick", [XFS_HEALTHMON_CORRUPT] = "corrupt", [XFS_HEALTHMON_HEALTHY] = "healthy", + [XFS_HEALTHMON_MEDIA_ERROR] = "media", }; if (event->type >= ARRAY_SIZE(type_strings)) @@ -454,6 +510,9 @@ xfs_healthmon_domstring( [XFS_HEALTHMON_AG] = "perag", [XFS_HEALTHMON_INODE] = "inode", [XFS_HEALTHMON_RTGROUP] = "rtgroup", + [XFS_HEALTHMON_DATADEV] = "datadev", + [XFS_HEALTHMON_LOGDEV] = "logdev", + [XFS_HEALTHMON_RTDEV] = "rtdev", }; if (event->domain >= ARRAY_SIZE(dom_strings)) @@ -665,6 +724,23 @@ xfs_healthmon_format_shutdown( event->flags); } +/* Render media error as a string set */ +static int +xfs_healthmon_format_media_error( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + ssize_t ret; + + ret = seq_buf_printf(outbuf, " \"daddr\": %llu,\n", + event->daddr); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"bbcount\": %llu,\n", + event->bbcount); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -730,6 +806,11 @@ xfs_healthmon_format( case XFS_HEALTHMON_INODE: ret = xfs_healthmon_format_inode(outbuf, event); break; + case XFS_HEALTHMON_DATADEV: + case XFS_HEALTHMON_LOGDEV: + case XFS_HEALTHMON_RTDEV: + ret = xfs_healthmon_format_media_error(outbuf, event); + break; } if (ret < 0) goto overrun; @@ -990,6 +1071,7 @@ xfs_healthmon_detach_hooks( * through the health monitoring subsystem from xfs_fs_put_super, so * it is now time to detach the hooks. */ + xfs_media_error_hook_del(hm->mp, &hm->mhook); xfs_shutdown_hook_del(hm->mp, &hm->shook); xfs_health_hook_del(hm->mp, &hm->hhook); return; @@ -1011,6 +1093,7 @@ xfs_healthmon_release( wake_up_all(&hm->wait); iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_media_error_hook_disable(); xfs_shutdown_hook_disable(); xfs_health_hook_disable(); @@ -1092,6 +1175,7 @@ xfs_ioc_health_monitor( /* Enable hooks to receive events, generally. */ xfs_health_hook_enable(); xfs_shutdown_hook_enable(); + xfs_media_error_hook_enable(); /* Attach specific event hooks to this monitor. */ xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); @@ -1104,11 +1188,16 @@ xfs_ioc_health_monitor( if (ret) goto out_healthhook; + xfs_media_error_hook_setup(&hm->mhook, xfs_healthmon_media_error_hook); + ret = xfs_media_error_hook_add(mp, &hm->mhook); + if (ret) + goto out_shutdownhook; + /* Set up VFS file and file descriptor. */ name = kasprintf(GFP_KERNEL, "XFS (%s): healthmon", mp->m_super->s_id); if (!name) { ret = -ENOMEM; - goto out_shutdownhook; + goto out_mediahook; } fd = anon_inode_getfd(name, &xfs_healthmon_fops, hm, @@ -1116,18 +1205,21 @@ xfs_ioc_health_monitor( kvfree(name); if (fd < 0) { ret = fd; - goto out_shutdownhook; + goto out_mediahook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; +out_mediahook: + xfs_media_error_hook_del(mp, &hm->mhook); out_shutdownhook: xfs_shutdown_hook_del(mp, &hm->shook); out_healthhook: xfs_health_hook_del(mp, &hm->hhook); out_hooks: + xfs_media_error_hook_disable(); xfs_health_hook_disable(); xfs_shutdown_hook_disable(); mutex_destroy(&hm->lock); diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index a7b2eaf3dd64e1..23ce320f4b086b 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -17,6 +17,9 @@ enum xfs_healthmon_type { XFS_HEALTHMON_CORRUPT, /* fsck reported corruption */ XFS_HEALTHMON_HEALTHY, /* fsck reported healthy structure */ XFS_HEALTHMON_UNMOUNT, /* filesystem is unmounting */ + + /* media errors */ + XFS_HEALTHMON_MEDIA_ERROR, }; enum xfs_healthmon_domain { @@ -27,6 +30,11 @@ enum xfs_healthmon_domain { XFS_HEALTHMON_AG, /* allocation group metadata */ XFS_HEALTHMON_INODE, /* inode metadata */ XFS_HEALTHMON_RTGROUP, /* realtime group metadata */ + + /* media errors */ + XFS_HEALTHMON_DATADEV, + XFS_HEALTHMON_RTDEV, + XFS_HEALTHMON_LOGDEV, }; struct xfs_healthmon_event { @@ -60,6 +68,11 @@ struct xfs_healthmon_event { uint32_t gen; xfs_ino_t ino; }; + /* media errors */ + struct { + xfs_daddr_t daddr; + uint64_t bbcount; + }; }; }; diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 41a2ac85dc5fdf..23741ff36a2e14 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -54,6 +54,7 @@ #include "xfs_fsrefs.h" #include "xfs_health.h" #include "xfs_healthmon.h" +#include "xfs_notify_failure.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 404b857db39d0d..47293206400d6e 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -108,6 +108,7 @@ struct xfs_fsrefs_irec; struct xfs_rtgroup; struct xfs_healthmon_event; struct xfs_health_update_params; +struct xfs_media_error_params; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -6345,6 +6346,56 @@ TRACE_EVENT(xfs_healthmon_metadata_hook, __entry->events, __entry->lost_prev) ); + +#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +TRACE_EVENT(xfs_healthmon_media_error_hook, + TP_PROTO(const struct xfs_media_error_params *p, + unsigned int events, bool lost_prev), + TP_ARGS(p, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, error_dev) + __field(uint64_t, daddr) + __field(uint64_t, bbcount) + __field(int, pre_remove) + __field(unsigned int, events) + __field(bool, lost_prev) + ), + TP_fast_assign( + struct xfs_mount *mp = p->mp; + struct xfs_buftarg *btp = NULL; + + switch (p->fdev) { + case XFS_FAILED_DATADEV: + btp = mp->m_ddev_targp; + break; + case XFS_FAILED_LOGDEV: + btp = mp->m_logdev_targp; + break; + case XFS_FAILED_RTDEV: + btp = mp->m_rtdev_targp; + break; + } + + __entry->dev = mp->m_super->s_dev; + if (btp) + __entry->error_dev = btp->bt_dev; + __entry->daddr = p->daddr; + __entry->bbcount = p->bbcount; + __entry->pre_remove = p->pre_remove; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d error_dev %d:%d daddr 0x%llx bbcount 0x%llx pre_remove? %d events %u lost_prev? %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->error_dev), MINOR(__entry->error_dev), + __entry->daddr, + __entry->bbcount, + __entry->pre_remove, + __entry->events, + __entry->lost_prev) +); +#endif #endif /* CONFIG_XFS_HEALTH_MONITOR */ #endif /* _TRACE_XFS_H */ From patchwork Tue Dec 31 23:42:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924048 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2734A13FD72 for ; Tue, 31 Dec 2024 23:42:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688535; cv=none; b=LzlyMpbcbta08v47aGVpnVOKLnKF+7CW5Ic3BBxHDz6Q+Zy/fKAkfwbW2rgFaz8MQQiTE0p2EmY82wP75b1Bh0NaeJHuDzB+ZnjcJanlxtBqiIZIj7Pk8sZalKUEahdglaUC3XCIVjLOQTPS1y10rK51JtPrBzcHeZKdb/4+msg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688535; c=relaxed/simple; bh=nM15EzjsAKd5kXUDRM6rnZA9kF/ukzVhVsX08LGnn8s=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PkwEVogie97/wnrBas2hIJUYIunRK7WP/I3tdOJFMjalJGqWWbojDj40Ed7jkOeK8o3xtFju4PFkQddj8viZyJFtnlrhQh2czCmQCcDY3dg07Gy91Srxpj7YS7ph/4rhEUjFdQT8YaEaq0n91xavkMiY73yJAHgdPfJD7Re131E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PFZgAfYG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PFZgAfYG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1490C4CED2; Tue, 31 Dec 2024 23:42:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688535; bh=nM15EzjsAKd5kXUDRM6rnZA9kF/ukzVhVsX08LGnn8s=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=PFZgAfYGd/ottiePp4a5lFvNVXttRv9/0n6UyBvbAFaAiAR7zdD1tpGxfOyBM/92H PB003FdlL6Um7D1rng8ySY7u8/vrAImNykj9bnhXWVRHab5soKRjb/5zrEt8S7Vunm oJYGvUVeJdmyDCB4kJejRpvtpuhfCrSn+b+Y4xTdFlNDF/LG5m536/a9z9JqORGMRe wmXJNeya5Wnjys+6kwe4JIMNkVhc4H61JnkrLV3udlCWI4t25BGhsMhR9FoIDsCSIz d6PRZT6Uv83lzcK6pkNC2FKCNLfLrXTAt79pkGXLOsfmzGd9/5/CpazVbOXPRN/d9F xypUX4ryHCvYQ== Date: Tue, 31 Dec 2024 15:42:14 -0800 Subject: [PATCH 13/16] xfs: report file io errors through healthmon From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754968.2704911.4424040488364281164.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Set up a file io error event hook so that we can send events about read errors, writeback errors, and directio errors to userspace. Signed-off-by: "Darrick J. Wong" --- fs/xfs/libxfs/xfs_healthmon.schema.json | 77 ++++++++++++++++++++ fs/xfs/xfs_healthmon.c | 120 ++++++++++++++++++++++++++++++- fs/xfs/xfs_healthmon.h | 16 ++++ fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 50 +++++++++++++ 5 files changed, 262 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_healthmon.schema.json b/fs/xfs/libxfs/xfs_healthmon.schema.json index 006f4145faa9f5..9c1070a629997c 100644 --- a/fs/xfs/libxfs/xfs_healthmon.schema.json +++ b/fs/xfs/libxfs/xfs_healthmon.schema.json @@ -36,6 +36,9 @@ }, { "$ref": "#/$events/media_error" + }, + { + "$ref": "#/$events/file_ioerror" } ], @@ -67,6 +70,16 @@ "description": "Inode generation number", "type": "integer" }, + "off_t": { + "description": "File position, in bytes", + "type": "integer", + "minimum": 0 + }, + "size_t": { + "description": "File operation length, in bytes", + "type": "integer", + "minimum": 1 + }, "storage_devs": { "description": "Storage devices in a filesystem", "_comment": [ @@ -261,6 +274,26 @@ } }, + "$comment": "File IO event data are defined here.", + "$fileio": { + "types": { + "description": [ + "File I/O operations. One of:", + "", + " * readahead: reads into the page cache.", + " * writeback: writeback of dirty page cache.", + " * dioread: O_DIRECT reads.", + " * diowrite: O_DIRECT writes." + ], + "enum": [ + "readahead", + "writeback", + "dioread", + "diowrite" + ] + } + }, + "$comment": "Event types are defined here.", "$events": { "lost": { @@ -513,6 +546,50 @@ "daddr", "bbcount" ] + }, + "file_ioerror": { + "title": "File I/O error", + "description": [ + "A read or a write to a file failed. The", + "inode, generation, pos, and len fields", + "describe the range of the file that is", + "affected." + ], + "type": "object", + + "properties": { + "type": { + "$ref": "#/$fileio/types" + }, + "time_ns": { + "$ref": "#/$defs/time_ns" + }, + "domain": { + "const": "filerange" + }, + "inumber": { + "$ref": "#/$defs/xfs_ino_t" + }, + "generation": { + "$ref": "#/$defs/i_generation" + }, + "pos": { + "$ref": "#/$defs/off_t" + }, + "len": { + "$ref": "#/$defs/size_t" + } + }, + + "required": [ + "type", + "time_ns", + "domain", + "inumber", + "generation", + "pos", + "len" + ] } } } diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index c828ea7442e932..9320f12b60ade9 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -22,6 +22,7 @@ #include "xfs_healthmon.h" #include "xfs_fsops.h" #include "xfs_notify_failure.h" +#include "xfs_file.h" #include #include @@ -72,6 +73,7 @@ struct xfs_healthmon { struct xfs_shutdown_hook shook; struct xfs_health_hook hhook; struct xfs_media_error_hook mhook; + struct xfs_file_ioerror_hook fhook; /* filesystem mount, or NULL if we've unmounted */ struct xfs_mount *mp; @@ -478,6 +480,73 @@ xfs_healthmon_media_error_hook( } #endif +/* Add a file io error event to the reporting queue. */ +STATIC int +xfs_healthmon_file_ioerror_hook( + struct notifier_block *nb, + unsigned long action, + void *data) +{ + struct xfs_healthmon *hm; + struct xfs_healthmon_event *event; + struct xfs_file_ioerror_params *p = data; + enum xfs_healthmon_type type = 0; + int error; + + hm = container_of(nb, struct xfs_healthmon, fhook.ioerror_hook.nb); + + switch (action) { + case XFS_FILE_IOERROR_BUFFERED_READ: + case XFS_FILE_IOERROR_BUFFERED_WRITE: + case XFS_FILE_IOERROR_DIRECT_READ: + case XFS_FILE_IOERROR_DIRECT_WRITE: + break; + default: + ASSERT(0); + return NOTIFY_DONE; + } + + mutex_lock(&hm->lock); + + trace_xfs_healthmon_file_ioerror_hook(hm->mp, action, p, hm->events, + hm->lost_prev_event); + + error = xfs_healthmon_start_live_update(hm); + if (error) + goto out_unlock; + + switch (action) { + case XFS_FILE_IOERROR_BUFFERED_READ: + type = XFS_HEALTHMON_BUFREAD; + break; + case XFS_FILE_IOERROR_BUFFERED_WRITE: + type = XFS_HEALTHMON_BUFWRITE; + break; + case XFS_FILE_IOERROR_DIRECT_READ: + type = XFS_HEALTHMON_DIOREAD; + break; + case XFS_FILE_IOERROR_DIRECT_WRITE: + type = XFS_HEALTHMON_DIOWRITE; + break; + } + + event = xfs_healthmon_alloc(hm, type, XFS_HEALTHMON_FILERANGE); + if (!event) + goto out_unlock; + + event->fino = p->ino; + event->fgen = p->gen; + event->fpos = p->pos; + event->flen = p->len; + error = xfs_healthmon_push(hm, event); + if (error) + kfree(event); + +out_unlock: + mutex_unlock(&hm->lock); + return NOTIFY_DONE; +} + /* Render the health update type as a string. */ STATIC const char * xfs_healthmon_typestring( @@ -491,6 +560,10 @@ xfs_healthmon_typestring( [XFS_HEALTHMON_CORRUPT] = "corrupt", [XFS_HEALTHMON_HEALTHY] = "healthy", [XFS_HEALTHMON_MEDIA_ERROR] = "media", + [XFS_HEALTHMON_BUFREAD] = "readahead", + [XFS_HEALTHMON_BUFWRITE] = "writeback", + [XFS_HEALTHMON_DIOREAD] = "dioread", + [XFS_HEALTHMON_DIOWRITE] = "diowrite", }; if (event->type >= ARRAY_SIZE(type_strings)) @@ -513,6 +586,7 @@ xfs_healthmon_domstring( [XFS_HEALTHMON_DATADEV] = "datadev", [XFS_HEALTHMON_LOGDEV] = "logdev", [XFS_HEALTHMON_RTDEV] = "rtdev", + [XFS_HEALTHMON_FILERANGE] = "filerange", }; if (event->domain >= ARRAY_SIZE(dom_strings)) @@ -741,6 +815,33 @@ xfs_healthmon_format_media_error( event->bbcount); } +/* Render file range events as a string set */ +static int +xfs_healthmon_format_filerange( + struct seq_buf *outbuf, + const struct xfs_healthmon_event *event) +{ + ssize_t ret; + + ret = seq_buf_printf(outbuf, " \"inumber\": %llu,\n", + event->fino); + if (ret < 0) + return ret; + + ret = seq_buf_printf(outbuf, " \"generation\": %u,\n", + event->fgen); + if (ret < 0) + return ret; + + ret = seq_buf_printf(outbuf, " \"pos\": %llu,\n", + event->fpos); + if (ret < 0) + return ret; + + return seq_buf_printf(outbuf, " \"length\": %llu,\n", + event->flen); +} + static inline void xfs_healthmon_reset_outbuf( struct xfs_healthmon *hm) @@ -811,6 +912,9 @@ xfs_healthmon_format( case XFS_HEALTHMON_RTDEV: ret = xfs_healthmon_format_media_error(outbuf, event); break; + case XFS_HEALTHMON_FILERANGE: + ret = xfs_healthmon_format_filerange(outbuf, event); + break; } if (ret < 0) goto overrun; @@ -1071,6 +1175,7 @@ xfs_healthmon_detach_hooks( * through the health monitoring subsystem from xfs_fs_put_super, so * it is now time to detach the hooks. */ + xfs_file_ioerror_hook_del(hm->mp, &hm->fhook); xfs_media_error_hook_del(hm->mp, &hm->mhook); xfs_shutdown_hook_del(hm->mp, &hm->shook); xfs_health_hook_del(hm->mp, &hm->hhook); @@ -1093,6 +1198,7 @@ xfs_healthmon_release( wake_up_all(&hm->wait); iterate_supers_type(hm->fstyp, xfs_healthmon_detach_hooks, hm); + xfs_file_ioerror_hook_disable(); xfs_media_error_hook_disable(); xfs_shutdown_hook_disable(); xfs_health_hook_disable(); @@ -1176,6 +1282,7 @@ xfs_ioc_health_monitor( xfs_health_hook_enable(); xfs_shutdown_hook_enable(); xfs_media_error_hook_enable(); + xfs_file_ioerror_hook_enable(); /* Attach specific event hooks to this monitor. */ xfs_health_hook_setup(&hm->hhook, xfs_healthmon_metadata_hook); @@ -1193,11 +1300,17 @@ xfs_ioc_health_monitor( if (ret) goto out_shutdownhook; + xfs_file_ioerror_hook_setup(&hm->fhook, + xfs_healthmon_file_ioerror_hook); + ret = xfs_file_ioerror_hook_add(mp, &hm->fhook); + if (ret) + goto out_mediahook; + /* Set up VFS file and file descriptor. */ name = kasprintf(GFP_KERNEL, "XFS (%s): healthmon", mp->m_super->s_id); if (!name) { ret = -ENOMEM; - goto out_mediahook; + goto out_ioerrhook; } fd = anon_inode_getfd(name, &xfs_healthmon_fops, hm, @@ -1205,13 +1318,15 @@ xfs_ioc_health_monitor( kvfree(name); if (fd < 0) { ret = fd; - goto out_mediahook; + goto out_ioerrhook; } trace_xfs_healthmon_create(mp, hmo.flags, hmo.format); return fd; +out_ioerrhook: + xfs_file_ioerror_hook_del(mp, &hm->fhook); out_mediahook: xfs_media_error_hook_del(mp, &hm->mhook); out_shutdownhook: @@ -1219,6 +1334,7 @@ xfs_ioc_health_monitor( out_healthhook: xfs_health_hook_del(mp, &hm->hhook); out_hooks: + xfs_file_ioerror_hook_disable(); xfs_media_error_hook_disable(); xfs_health_hook_disable(); xfs_shutdown_hook_disable(); diff --git a/fs/xfs/xfs_healthmon.h b/fs/xfs/xfs_healthmon.h index 23ce320f4b086b..748173eed79660 100644 --- a/fs/xfs/xfs_healthmon.h +++ b/fs/xfs/xfs_healthmon.h @@ -20,6 +20,12 @@ enum xfs_healthmon_type { /* media errors */ XFS_HEALTHMON_MEDIA_ERROR, + + /* file range events */ + XFS_HEALTHMON_BUFREAD, + XFS_HEALTHMON_BUFWRITE, + XFS_HEALTHMON_DIOREAD, + XFS_HEALTHMON_DIOWRITE, }; enum xfs_healthmon_domain { @@ -35,6 +41,9 @@ enum xfs_healthmon_domain { XFS_HEALTHMON_DATADEV, XFS_HEALTHMON_RTDEV, XFS_HEALTHMON_LOGDEV, + + /* file range events */ + XFS_HEALTHMON_FILERANGE, }; struct xfs_healthmon_event { @@ -73,6 +82,13 @@ struct xfs_healthmon_event { xfs_daddr_t daddr; uint64_t bbcount; }; + /* file range events */ + struct { + xfs_ino_t fino; + loff_t fpos; + uint64_t flen; + uint32_t fgen; + }; }; }; diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 23741ff36a2e14..d8e5d607b0dc6a 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -55,6 +55,7 @@ #include "xfs_health.h" #include "xfs_healthmon.h" #include "xfs_notify_failure.h" +#include "xfs_file.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 47293206400d6e..aba32f5ccc1a3b 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -109,6 +109,7 @@ struct xfs_rtgroup; struct xfs_healthmon_event; struct xfs_health_update_params; struct xfs_media_error_params; +struct xfs_file_ioerror_params; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -6396,6 +6397,55 @@ TRACE_EVENT(xfs_healthmon_media_error_hook, __entry->lost_prev) ); #endif + +#define XFS_FILE_IOERROR_STRINGS \ + { XFS_FILE_IOERROR_BUFFERED_READ, "readahead" }, \ + { XFS_FILE_IOERROR_BUFFERED_WRITE, "writeback" }, \ + { XFS_FILE_IOERROR_DIRECT_READ, "dioread" }, \ + { XFS_FILE_IOERROR_DIRECT_WRITE, "diowrite" } + +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_BUFFERED_READ); +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_BUFFERED_WRITE); +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_DIRECT_READ); +TRACE_DEFINE_ENUM(XFS_FILE_IOERROR_DIRECT_WRITE); + +TRACE_EVENT(xfs_healthmon_file_ioerror_hook, + TP_PROTO(const struct xfs_mount *mp, + unsigned long action, + const struct xfs_file_ioerror_params *p, + unsigned int events, bool lost_prev), + TP_ARGS(mp, action, p, events, lost_prev), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, error_dev) + __field(unsigned long, action) + __field(unsigned long long, ino) + __field(unsigned int, gen) + __field(long long, pos) + __field(unsigned long long, len) + __field(unsigned int, events) + __field(bool, lost_prev) + ), + TP_fast_assign( + __entry->dev = mp ? mp->m_super->s_dev : 0; + __entry->action = action; + __entry->ino = p->ino; + __entry->gen = p->gen; + __entry->pos = p->pos; + __entry->len = p->len; + __entry->events = events; + __entry->lost_prev = lost_prev; + ), + TP_printk("dev %d:%d ino 0x%llx gen 0x%x op %s pos 0x%llx bytecount 0x%llx events %u lost_prev? %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __entry->gen, + __print_symbolic(__entry->action, XFS_FILE_IOERROR_STRINGS), + __entry->pos, + __entry->len, + __entry->events, + __entry->lost_prev) +); #endif /* CONFIG_XFS_HEALTH_MONITOR */ #endif /* _TRACE_XFS_H */ From patchwork Tue Dec 31 23:42:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924049 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFAD71B0414 for ; Tue, 31 Dec 2024 23:42:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688550; cv=none; b=qd74nbdkERafKmMOQeTrr9Lzs+fhFJiPscuhnkjcOfut605uBnukE7zB706TO6X0+I/BoTOS7PtZcfgL6fqfWj6aqkqROw9dMkDdt6aLibwwp5/emicPlC3wpAMWw2S+lstGzNyouITwN8dnxTv4PHy99sLeELd4A4AFLW+jdNU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688550; c=relaxed/simple; bh=ZbpTg5bFXC06JXqSvyMfOw4yOkpn0Ypr2FREpMFen9U=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hFL8sL/FuQHYIo4XvQ/HVQ69Vmv+PnM3fWvbmVXrxKkIlsrXy8D8vFNTvPGJzMJJAguI6vh97qweObpFZkDxVAHtXZdEW+8xxtwWQmNOxM7xZQZimTMDRFHGb7GvWk2MJ7we79QdpL6EFeaRdzkRqse8wImWl2+vJ73QDpMW918= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MH5yxZTt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MH5yxZTt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 966E5C4CED2; Tue, 31 Dec 2024 23:42:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688550; bh=ZbpTg5bFXC06JXqSvyMfOw4yOkpn0Ypr2FREpMFen9U=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=MH5yxZTtqS6Xzt1N3VqFOGnjrSRia6bcTO3fKx0HZ7pKhR5Xj5tGq/MDHLL0KpoIf yOmUvqZkzkUrrUGPwVtCUqUyCSM3Vawz8RSXK9WnnzUPS9LJa2PsPQ9KgCOvsaZIwS qZriq0g4hwIUmMW0/W5Y+EIYeoXQWars7tmW0byhL+xxbRRquT0DpGsNBvBGHB5TfZ KU5N5TDiT8BTEi4pBL/2KKeCFOtLT7UBUsNvl799H8wa7lpr980RdncOyiN79BExxr lZDN6J0W2N0q957XgEAKueVuYIvj1v7fL0hZwQhjHEV0hYQDPRkTGm8tusEj97bScl dPm99u2w3Y7oQ== Date: Tue, 31 Dec 2024 15:42:30 -0800 Subject: [PATCH 14/16] xfs: allow reconfiguration of the health monitoring device From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568754986.2704911.11270955358261059464.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Make it so that we can reconfigure the health monitoring device by calling the XFS_IOC_HEALTH_MONITOR ioctl on it. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_healthmon.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 9320f12b60ade9..67f7d4a8cc7f58 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -23,6 +23,8 @@ #include "xfs_fsops.h" #include "xfs_notify_failure.h" #include "xfs_file.h" +#include "xfs_fs.h" +#include "xfs_ioctl.h" #include #include @@ -1228,11 +1230,38 @@ xfs_healthmon_validate( return true; } +/* Handle ioctls for the health monitoring thread. */ +STATIC long +xfs_healthmon_ioctl( + struct file *file, + unsigned int cmd, + unsigned long p) +{ + struct xfs_health_monitor hmo; + struct xfs_healthmon *hm = file->private_data; + void __user *arg = (void __user *)p; + + if (cmd != XFS_IOC_HEALTH_MONITOR) + return -ENOTTY; + + if (copy_from_user(&hmo, arg, sizeof(hmo))) + return -EFAULT; + + if (!xfs_healthmon_validate(&hmo)) + return -EINVAL; + + mutex_lock(&hm->lock); + hm->verbose = !!(hmo.flags & XFS_HEALTH_MONITOR_VERBOSE); + mutex_unlock(&hm->lock); + return 0; +} + static const struct file_operations xfs_healthmon_fops = { .owner = THIS_MODULE, .read_iter = xfs_healthmon_read_iter, .poll = xfs_healthmon_poll, .release = xfs_healthmon_release, + .unlocked_ioctl = xfs_healthmon_ioctl, }; /* From patchwork Tue Dec 31 23:42:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924050 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4C2613FD72 for ; Tue, 31 Dec 2024 23:42:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688566; cv=none; b=M5PxPAycBBcycWWA36h7AWHRXmKPKvO1UPzhcdTkglpQ25RnkwefM9kYjlkEFI0m7c7LfX2ZEA/1Sv+srso7TS+GBwWJqMZQBpKkGwhsFujhDpbJQ4rxFhD2iTi3361GXTDgj6BB3Cp3fBj7PUO66VQdqpTNtSAXPcIvuXNkwwQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688566; c=relaxed/simple; bh=OjrjrXMYMsSZjyJYAkD53PG0S+L3rwTy4p1KqEFeO5k=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IW5NVBsizk2HHXjwBCAZLYlr6KWfAZwjD1klniq9Wayu4PVxdmP7WNn+wFqlSPwll6ol3swwrZEjBy+W2oc+ZeFdb0GwhPgXZNOg43yrslgJjvlMbp3BJbss0mI3yK9gXsKaCgXk9Bt+mKCrpDSgzH2HcAx5ZOfyahzoJI2onIc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BVHeXgkK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BVHeXgkK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 40500C4CED2; Tue, 31 Dec 2024 23:42:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688566; bh=OjrjrXMYMsSZjyJYAkD53PG0S+L3rwTy4p1KqEFeO5k=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=BVHeXgkKVaBe8EFkCug4SMtEItESpimCu1Q3uhEPohZvQdgM3vP+TglAnkfQ9/4vK uOKa39a0EMrDjRlN96+f1z1pANrrxylO/rhe8rCRf+fQ0Tyu7warX43WmqLK5jxKi9 fd0xy3bfolhlsiW5lRaypkkiVFuWtX8QUN4vyzqoxG1ZH1mMr2Y6gAZqkoBmmlojwr lwvjtm2KIlG18vzGRL1R580xGbF4UOMLqwNmAEW/HPkO8BZc4+X0uTU5UG5SQcnZyO YM+nw0k5zvYil+kpdH3VtK/zmxe2p8BjrhfmxghkqufkqT787lgMEaXCWo0Tgicx00 iNtktIKFHgCYg== Date: Tue, 31 Dec 2024 15:42:45 -0800 Subject: [PATCH 15/16] xfs: add media error reporting ioctl From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568755003.2704911.1058228100772058099.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Add a new privileged ioctl so that xfs_scrub can report media errors to the kernel for further processing. Signed-off-by: "Darrick J. Wong" --- fs/xfs/Makefile | 6 +---- fs/xfs/libxfs/xfs_fs.h | 15 ++++++++++++ fs/xfs/xfs_healthmon.c | 2 -- fs/xfs/xfs_ioctl.c | 3 ++ fs/xfs/xfs_notify_failure.c | 53 ++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_notify_failure.h | 8 ++++++ fs/xfs/xfs_trace.h | 2 -- 7 files changed, 78 insertions(+), 11 deletions(-) diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 94a9dc7aa7a1d5..71e6512899da3a 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -99,6 +99,7 @@ xfs-y += xfs_aops.o \ xfs_message.o \ xfs_mount.o \ xfs_mru_cache.o \ + xfs_notify_failure.o \ xfs_pwork.o \ xfs_reflink.o \ xfs_stats.o \ @@ -149,11 +150,6 @@ xfs-$(CONFIG_SYSCTL) += xfs_sysctl.o xfs-$(CONFIG_COMPAT) += xfs_ioctl32.o xfs-$(CONFIG_EXPORTFS_BLOCK_OPS) += xfs_pnfs.o -# notify failure -ifeq ($(CONFIG_MEMORY_FAILURE),y) -xfs-$(CONFIG_FS_DAX) += xfs_notify_failure.o -endif - xfs-$(CONFIG_XFS_DRAIN_INTENTS) += xfs_drain.o xfs-$(CONFIG_XFS_LIVE_HOOKS) += xfs_hooks.o xfs-$(CONFIG_XFS_MEMORY_BUFS) += xfs_buf_mem.o diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index d7404e6efd866d..32e552d40b1bf5 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -1115,6 +1115,20 @@ struct xfs_health_monitor { /* Return events in JSON format */ #define XFS_HEALTH_MONITOR_FMT_JSON (1) +struct xfs_media_error { + __u64 flags; /* flags */ + __u64 daddr; /* disk address of range */ + __u64 bbcount; /* length, in 512b blocks */ + __u64 pad; /* zero */ +}; + +#define XFS_MEDIA_ERROR_DATADEV (1) /* data device */ +#define XFS_MEDIA_ERROR_LOGDEV (2) /* external log device */ +#define XFS_MEDIA_ERROR_RTDEV (3) /* realtime device */ + +/* bottom byte of flags is the device code */ +#define XFS_MEDIA_ERROR_DEVMASK (0xFF) + /* * ioctl commands that are used by Linux filesystems */ @@ -1157,6 +1171,7 @@ struct xfs_health_monitor { #define XFS_IOC_GETFSREFCOUNTS _IOWR('X', 66, struct xfs_getfsrefs_head) #define XFS_IOC_MAP_FREESP _IOW ('X', 67, struct xfs_map_freesp) #define XFS_IOC_HEALTH_MONITOR _IOW ('X', 68, struct xfs_health_monitor) +#define XFS_IOC_MEDIA_ERROR _IOW ('X', 69, struct xfs_media_error) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/xfs_healthmon.c b/fs/xfs/xfs_healthmon.c index 67f7d4a8cc7f58..b6fdad798fae89 100644 --- a/fs/xfs/xfs_healthmon.c +++ b/fs/xfs/xfs_healthmon.c @@ -429,7 +429,6 @@ xfs_healthmon_shutdown_hook( return NOTIFY_DONE; } -#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) /* Add a media error event to the reporting queue. */ STATIC int xfs_healthmon_media_error_hook( @@ -480,7 +479,6 @@ xfs_healthmon_media_error_hook( mutex_unlock(&hm->lock); return NOTIFY_DONE; } -#endif /* Add a file io error event to the reporting queue. */ STATIC int diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 6c7a30128c7bf6..c253538c48f3b3 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -43,6 +43,7 @@ #include "xfs_handle.h" #include "xfs_rtgroup.h" #include "xfs_healthmon.h" +#include "xfs_notify_failure.h" #include #include @@ -1437,6 +1438,8 @@ xfs_file_ioctl( case XFS_IOC_HEALTH_MONITOR: return xfs_ioc_health_monitor(mp, arg); + case XFS_IOC_MEDIA_ERROR: + return xfs_ioc_media_error(mp, arg); default: return -ENOTTY; diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c index ea68c7e61bb585..fcf9f0139d673c 100644 --- a/fs/xfs/xfs_notify_failure.c +++ b/fs/xfs/xfs_notify_failure.c @@ -91,9 +91,19 @@ xfs_media_error_hook_setup( xfs_hook_setup(&hook->error_hook, mod_fn); } #else -# define xfs_media_error_hook(...) ((void)0) +static inline void +xfs_media_error_hook( + struct xfs_mount *mp, + enum xfs_failed_device fdev, + xfs_daddr_t daddr, + uint64_t bbcount, + bool pre_remove) +{ + /* empty */ +} #endif /* CONFIG_XFS_LIVE_HOOKS */ +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) struct xfs_failure_info { xfs_agblock_t startblock; xfs_extlen_t blockcount; @@ -463,3 +473,44 @@ xfs_dax_notify_failure( const struct dax_holder_operations xfs_dax_holder_operations = { .notify_failure = xfs_dax_notify_failure, }; +#endif /* CONFIG_MEMORY_FAILURE && CONFIG_FS_DAX */ + +#define XFS_VALID_MEDIA_ERROR_FLAGS (XFS_MEDIA_ERROR_DATADEV | \ + XFS_MEDIA_ERROR_LOGDEV | \ + XFS_MEDIA_ERROR_RTDEV) +int +xfs_ioc_media_error( + struct xfs_mount *mp, + struct xfs_media_error __user *arg) +{ + struct xfs_media_error me; + enum xfs_failed_device fdev; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (copy_from_user(&me, arg, sizeof(me))) + return -EFAULT; + + if (me.pad) + return -EINVAL; + if (me.flags & ~XFS_VALID_MEDIA_ERROR_FLAGS) + return -EINVAL; + + switch (me.flags & XFS_MEDIA_ERROR_DEVMASK) { + case XFS_MEDIA_ERROR_DATADEV: + fdev = XFS_FAILED_DATADEV; + break; + case XFS_MEDIA_ERROR_LOGDEV: + fdev = XFS_FAILED_LOGDEV; + break; + case XFS_MEDIA_ERROR_RTDEV: + fdev = XFS_FAILED_RTDEV; + break; + default: + return -EINVAL; + } + + xfs_media_error_hook(mp, fdev, me.daddr, me.bbcount, false); + return 0; +} diff --git a/fs/xfs/xfs_notify_failure.h b/fs/xfs/xfs_notify_failure.h index 835d4af504d832..c23034891d99fd 100644 --- a/fs/xfs/xfs_notify_failure.h +++ b/fs/xfs/xfs_notify_failure.h @@ -6,7 +6,9 @@ #ifndef __XFS_NOTIFY_FAILURE_H__ #define __XFS_NOTIFY_FAILURE_H__ +#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) extern const struct dax_holder_operations xfs_dax_holder_operations; +#endif enum xfs_failed_device { XFS_FAILED_DATADEV, @@ -14,7 +16,7 @@ enum xfs_failed_device { XFS_FAILED_RTDEV, }; -#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) +#if defined(CONFIG_XFS_LIVE_HOOKS) struct xfs_media_error_params { struct xfs_mount *mp; enum xfs_failed_device fdev; @@ -46,4 +48,8 @@ struct xfs_media_error_hook { }; # define xfs_media_error_hook_setup(...) ((void)0) #endif /* CONFIG_XFS_LIVE_HOOKS */ +struct xfs_media_error; +int xfs_ioc_media_error(struct xfs_mount *mp, + struct xfs_media_error __user *arg); + #endif /* __XFS_NOTIFY_FAILURE_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index aba32f5ccc1a3b..3baa39a2b0a8b8 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -6348,7 +6348,6 @@ TRACE_EVENT(xfs_healthmon_metadata_hook, __entry->lost_prev) ); -#if defined(CONFIG_XFS_LIVE_HOOKS) && defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_FS_DAX) TRACE_EVENT(xfs_healthmon_media_error_hook, TP_PROTO(const struct xfs_media_error_params *p, unsigned int events, bool lost_prev), @@ -6396,7 +6395,6 @@ TRACE_EVENT(xfs_healthmon_media_error_hook, __entry->events, __entry->lost_prev) ); -#endif #define XFS_FILE_IOERROR_STRINGS \ { XFS_FILE_IOERROR_BUFFERED_READ, "readahead" }, \ From patchwork Tue Dec 31 23:43:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13924051 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0FA0213FD72 for ; Tue, 31 Dec 2024 23:43:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688582; cv=none; b=TAdTMhJ6fFoOO2LNmKNe5OT+7zr/zZRYyTlW7XPl6FHV0yF36fZN0tLAOvTWsUMrTa6mdgGIieXSSzfKmTZTd9CxzqoE4Qn4uWgQm71jOCauxSof5nskiLDYvtfrIz0JRzOeZ4fWFxVI7lNMKsUqjN/WnN7R5ohKAqmDd1FqI40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735688582; c=relaxed/simple; bh=DyQCFuX9wn5lqFA6ydSy3TPmRCcPxsAQWzVXYhdSUe0=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=dkz0HLfL0RkCqTMJveZK/R3QOu9fQRz0E2va7ORkke6ogzyY/J+ZrBO/eo4gTaxc0Qq+6PwXhAE2VPWdeLBZJCj4+DxCPu/t9CXXESgQ8iundQ+en3+T7sDEXwtOZMJYODzMaUqYv6pFd+nr4vUeCi0GWMJLmxW1OCMsW8ljJC0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lCwLwFJ1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lCwLwFJ1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CDA63C4CED2; Tue, 31 Dec 2024 23:43:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1735688581; bh=DyQCFuX9wn5lqFA6ydSy3TPmRCcPxsAQWzVXYhdSUe0=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=lCwLwFJ12Uj7h8tIHHqbNhMCosCJE8p/Agh1W47240bu3QfKm1D73WLsM9AnMFh/Q P2jwowPIUhuZuvftfdqLJ3VhsRCIfwczYCVTJZef+EK1LvvnLpwBXHklZq39bxCYhq P3owTGig3RGr+zCCnb6pAHQJp5C53HS4h2ncLD6Q5DWNC/SkNlB2t+Qo1SDFWpSGaZ HEg6WdOO66pVkvRlRtdlsh/hIspCVpsd694Pi7K1wTVXoQA9LncjY65Yj6BjzCqO6y +0GUEhsDj9Kx0JtXcMAikLXNmyVuUkfOiEleMXUDyrK2i034bp8L0gXxIxamjpShSm TmpEAlKLFVH/w== Date: Tue, 31 Dec 2024 15:43:01 -0800 Subject: [PATCH 16/16] xfs: send uevents when mounting and unmounting a filesystem From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <173568755020.2704911.17739206325953827170.stgit@frogsfrogsfrogs> In-Reply-To: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> References: <173568754700.2704911.10879727466774074251.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Send uevents when we mount and unmount the filesystem, so that we can trigger systemd services. Signed-off-by: "Darrick J. Wong" --- fs/xfs/xfs_super.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index df6afcf8840948..1d295991e08047 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1197,12 +1197,28 @@ xfs_inodegc_free_percpu( free_percpu(mp->m_inodegc); } +static void +xfs_send_unmount_uevent( + struct xfs_mount *mp) +{ + char sid[256] = ""; + char *env[] = { + "TYPE=mount", + sid, + NULL, + }; + + snprintf(sid, sizeof(sid), "SID=%s", mp->m_super->s_id); + kobject_uevent_env(&mp->m_kobj.kobject, KOBJ_REMOVE, env); +} + static void xfs_fs_put_super( struct super_block *sb) { struct xfs_mount *mp = XFS_M(sb); + xfs_send_unmount_uevent(mp); xfs_notice(mp, "Unmounting Filesystem %pU", &mp->m_sb.sb_uuid); xfs_filestream_unmount(mp); xfs_unmountfs(mp); @@ -1590,6 +1606,29 @@ xfs_debugfs_mkdir( return child; } +/* + * Send a uevent signalling that the mount succeeded so we can use udev rules + * to start background services. + */ +static void +xfs_send_mount_uevent( + struct fs_context *fc, + struct xfs_mount *mp) +{ + char source[256] = ""; + char sid[256] = ""; + char *env[] = { + "TYPE=mount", + source, + sid, + NULL, + }; + + snprintf(source, sizeof(source), "SOURCE=%s", fc->source); + snprintf(sid, sizeof(sid), "SID=%s", mp->m_super->s_id); + kobject_uevent_env(&mp->m_kobj.kobject, KOBJ_ADD, env); +} + static int xfs_fs_fill_super( struct super_block *sb, @@ -1904,6 +1943,7 @@ xfs_fs_fill_super( mp->m_debugfs_uuid = NULL; } + xfs_send_mount_uevent(fc, mp); return 0; out_filestream_unmount: