From patchwork Mon Jun 3 00:32:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 13683149 Received: from out-179.mta0.migadu.com (out-179.mta0.migadu.com [91.218.175.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F0F1D15D1 for ; Mon, 3 Jun 2024 00:33:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374801; cv=none; b=GyqOu2+rW1CkMFvBe7lmjsp9pBDbcBMlIS7JjN1L8d2yP4cxEMiMNbuXZZuRhAZ8zFuU9pxdWjdIn1qhr8TMY80tdAI6UH0BZDa3mJbou1W+6ddTNRLS1MwAUnU2QOUrLA0ImiBoyNIJ1w/hwr75t9kBL3P5jqpX82ReopvFL2o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374801; c=relaxed/simple; bh=XDpvuG4s3ovgN0s3Tvb37TDg+pwRbYPgpqNn8zbOavI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PdS14vc+SgoqZ6/MvlZIM6G6LYFvw1NDJaY5y0l3WLSTBKNKiQRw6WwTr4ACiJI3fJk6HrZi6BVUAsbqL3JgB0XJ5Jk8ieIRQXF2jW26PnwlVjmtCygU4WqM2a4dIIqnXtaTFIFpzDaZe82A01bDo/LxDb02YAf2dHSgAPLPrwA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=AifI8db+; arc=none smtp.client-ip=91.218.175.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="AifI8db+" X-Envelope-To: linux-fsdevel@vger.kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717374796; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1NPXN7ErKBoGZ+ge1x2t3+lzs1N7VfbV+EIVGe5PtUM=; b=AifI8db+I7uHKCB29BPZNaLufmloWnxnT1nOjRs9v31loznRfI7U6+4DFcp/KG1N1MzEF2 t7UD55hvAVKjfAIbMGqavhRpyXFtCdHVIOQWBUNfk2EKE5Xh9KYJMPRatIMtf5Mmz44/2p +1MipJ9nXpsgFUNf0V8uyVx47aM/bMU= X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: axboe@kernel.dk X-Envelope-To: kent.overstreet@linux.dev X-Envelope-To: brauner@kernel.org X-Envelope-To: viro@zeniv.linux.org.uk X-Envelope-To: bernd.schubert@fastmail.fm X-Envelope-To: linux-mm@kvack.org X-Envelope-To: josef@toxicpanda.com X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Kent Overstreet , brauner@kernel.org, viro@zeniv.linux.org.uk, Bernd Schubert , linux-mm@kvack.org, Josef Bacik Subject: [PATCH 1/5] darray: lift from bcachefs Date: Sun, 2 Jun 2024 20:32:58 -0400 Message-ID: <20240603003306.2030491-2-kent.overstreet@linux.dev> In-Reply-To: <20240603003306.2030491-1-kent.overstreet@linux.dev> References: <20240603003306.2030491-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT dynamic arrays - inspired from CCAN darrays, basically c++ stl vectors. Used by thread_with_stdio, which is also being lifted from bcachefs for xfs. Signed-off-by: Kent Overstreet --- MAINTAINERS | 7 +++ fs/bcachefs/Makefile | 1 - fs/bcachefs/btree_types.h | 2 +- fs/bcachefs/btree_update.c | 2 + fs/bcachefs/btree_write_buffer_types.h | 2 +- fs/bcachefs/fsck.c | 2 +- fs/bcachefs/journal_io.h | 2 +- fs/bcachefs/journal_sb.c | 2 +- fs/bcachefs/sb-downgrade.c | 3 +- fs/bcachefs/sb-errors_types.h | 2 +- fs/bcachefs/sb-members.h | 3 +- fs/bcachefs/subvolume.h | 1 - fs/bcachefs/subvolume_types.h | 2 +- fs/bcachefs/thread_with_file_types.h | 2 +- fs/bcachefs/util.h | 28 +----------- {fs/bcachefs => include/linux}/darray.h | 59 ++++++++++++++++--------- include/linux/darray_types.h | 22 +++++++++ lib/Makefile | 2 +- {fs/bcachefs => lib}/darray.c | 12 ++++- 19 files changed, 95 insertions(+), 61 deletions(-) rename {fs/bcachefs => include/linux}/darray.h (66%) create mode 100644 include/linux/darray_types.h rename {fs/bcachefs => lib}/darray.c (56%) diff --git a/MAINTAINERS b/MAINTAINERS index d6c90161c7bf..fafa30715f66 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6010,6 +6010,13 @@ F: net/ax25/ax25_out.c F: net/ax25/ax25_timer.c F: net/ax25/sysctl_net_ax25.c +DARRAY +M: Kent Overstreet +L: linux-bcachefs@vger.kernel.org +S: Maintained +F: include/linux/darray.h +F: include/linux/darray_types.h + DATA ACCESS MONITOR M: SeongJae Park L: damon@lists.linux.dev diff --git a/fs/bcachefs/Makefile b/fs/bcachefs/Makefile index 66ca0bbee639..281e4a7c1f31 100644 --- a/fs/bcachefs/Makefile +++ b/fs/bcachefs/Makefile @@ -28,7 +28,6 @@ bcachefs-y := \ checksum.o \ clock.o \ compress.o \ - darray.o \ debug.o \ dirent.o \ disk_groups.o \ diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h index d63db4fefe73..7dcd015619af 100644 --- a/fs/bcachefs/btree_types.h +++ b/fs/bcachefs/btree_types.h @@ -2,13 +2,13 @@ #ifndef _BCACHEFS_BTREE_TYPES_H #define _BCACHEFS_BTREE_TYPES_H +#include #include #include #include "bbpos_types.h" #include "btree_key_cache_types.h" #include "buckets_types.h" -#include "darray.h" #include "errcode.h" #include "journal_types.h" #include "replicas_types.h" diff --git a/fs/bcachefs/btree_update.c b/fs/bcachefs/btree_update.c index f3c645a43dcb..23d52129db40 100644 --- a/fs/bcachefs/btree_update.c +++ b/fs/bcachefs/btree_update.c @@ -14,6 +14,8 @@ #include "snapshot.h" #include "trace.h" +#include + static inline int btree_insert_entry_cmp(const struct btree_insert_entry *l, const struct btree_insert_entry *r) { diff --git a/fs/bcachefs/btree_write_buffer_types.h b/fs/bcachefs/btree_write_buffer_types.h index 9b9433de9c36..5f248873087c 100644 --- a/fs/bcachefs/btree_write_buffer_types.h +++ b/fs/bcachefs/btree_write_buffer_types.h @@ -2,7 +2,7 @@ #ifndef _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H #define _BCACHEFS_BTREE_WRITE_BUFFER_TYPES_H -#include "darray.h" +#include #include "journal_types.h" #define BTREE_WRITE_BUFERED_VAL_U64s_MAX 4 diff --git a/fs/bcachefs/fsck.c b/fs/bcachefs/fsck.c index c8f57465131c..3ead927285b6 100644 --- a/fs/bcachefs/fsck.c +++ b/fs/bcachefs/fsck.c @@ -5,7 +5,6 @@ #include "btree_cache.h" #include "btree_update.h" #include "buckets.h" -#include "darray.h" #include "dirent.h" #include "error.h" #include "fs-common.h" @@ -18,6 +17,7 @@ #include "xattr.h" #include +#include #include /* struct qstr */ /* diff --git a/fs/bcachefs/journal_io.h b/fs/bcachefs/journal_io.h index 2ca9cde30ea8..2b8f458cf13c 100644 --- a/fs/bcachefs/journal_io.h +++ b/fs/bcachefs/journal_io.h @@ -2,7 +2,7 @@ #ifndef _BCACHEFS_JOURNAL_IO_H #define _BCACHEFS_JOURNAL_IO_H -#include "darray.h" +#include void bch2_journal_pos_from_member_info_set(struct bch_fs *); void bch2_journal_pos_from_member_info_resume(struct bch_fs *); diff --git a/fs/bcachefs/journal_sb.c b/fs/bcachefs/journal_sb.c index db80e506e3ab..9db57f6f1035 100644 --- a/fs/bcachefs/journal_sb.c +++ b/fs/bcachefs/journal_sb.c @@ -2,8 +2,8 @@ #include "bcachefs.h" #include "journal_sb.h" -#include "darray.h" +#include #include /* BCH_SB_FIELD_journal: */ diff --git a/fs/bcachefs/sb-downgrade.c b/fs/bcachefs/sb-downgrade.c index 390a1bbd2567..526e2c26d1b4 100644 --- a/fs/bcachefs/sb-downgrade.c +++ b/fs/bcachefs/sb-downgrade.c @@ -6,12 +6,13 @@ */ #include "bcachefs.h" -#include "darray.h" #include "recovery_passes.h" #include "sb-downgrade.h" #include "sb-errors.h" #include "super-io.h" +#include + #define RECOVERY_PASS_ALL_FSCK BIT_ULL(63) /* diff --git a/fs/bcachefs/sb-errors_types.h b/fs/bcachefs/sb-errors_types.h index 666599d3fb9d..39cae3a6a024 100644 --- a/fs/bcachefs/sb-errors_types.h +++ b/fs/bcachefs/sb-errors_types.h @@ -2,7 +2,7 @@ #ifndef _BCACHEFS_SB_ERRORS_TYPES_H #define _BCACHEFS_SB_ERRORS_TYPES_H -#include "darray.h" +#include #define BCH_SB_ERRS() \ x(clean_but_journal_not_empty, 0) \ diff --git a/fs/bcachefs/sb-members.h b/fs/bcachefs/sb-members.h index dd93192ec065..338275899b60 100644 --- a/fs/bcachefs/sb-members.h +++ b/fs/bcachefs/sb-members.h @@ -2,9 +2,10 @@ #ifndef _BCACHEFS_SB_MEMBERS_H #define _BCACHEFS_SB_MEMBERS_H -#include "darray.h" #include "bkey_types.h" +#include + extern char * const bch2_member_error_strs[]; static inline struct bch_member * diff --git a/fs/bcachefs/subvolume.h b/fs/bcachefs/subvolume.h index afa5e871efb2..0311b8669c76 100644 --- a/fs/bcachefs/subvolume.h +++ b/fs/bcachefs/subvolume.h @@ -2,7 +2,6 @@ #ifndef _BCACHEFS_SUBVOLUME_H #define _BCACHEFS_SUBVOLUME_H -#include "darray.h" #include "subvolume_types.h" enum bch_validate_flags; diff --git a/fs/bcachefs/subvolume_types.h b/fs/bcachefs/subvolume_types.h index 9b10c8947828..3a1ee762ad61 100644 --- a/fs/bcachefs/subvolume_types.h +++ b/fs/bcachefs/subvolume_types.h @@ -2,7 +2,7 @@ #ifndef _BCACHEFS_SUBVOLUME_TYPES_H #define _BCACHEFS_SUBVOLUME_TYPES_H -#include "darray.h" +#include typedef DARRAY(u32) snapshot_id_list; diff --git a/fs/bcachefs/thread_with_file_types.h b/fs/bcachefs/thread_with_file_types.h index e0daf4eec341..41990756aa26 100644 --- a/fs/bcachefs/thread_with_file_types.h +++ b/fs/bcachefs/thread_with_file_types.h @@ -2,7 +2,7 @@ #ifndef _BCACHEFS_THREAD_WITH_FILE_TYPES_H #define _BCACHEFS_THREAD_WITH_FILE_TYPES_H -#include "darray.h" +#include struct stdio_buf { spinlock_t lock; diff --git a/fs/bcachefs/util.h b/fs/bcachefs/util.h index 5d2c470a49ac..1da52a8b3914 100644 --- a/fs/bcachefs/util.h +++ b/fs/bcachefs/util.h @@ -5,22 +5,22 @@ #include #include #include +#include #include #include #include -#include #include #include #include #include #include +#include #include #include #include #include "mean_and_variance.h" -#include "darray.h" #include "time_stats.h" struct closure; @@ -626,30 +626,6 @@ static inline void memset_u64s_tail(void *s, int c, unsigned bytes) memset(s + bytes, c, rem); } -/* just the memmove, doesn't update @_nr */ -#define __array_insert_item(_array, _nr, _pos) \ - memmove(&(_array)[(_pos) + 1], \ - &(_array)[(_pos)], \ - sizeof((_array)[0]) * ((_nr) - (_pos))) - -#define array_insert_item(_array, _nr, _pos, _new_item) \ -do { \ - __array_insert_item(_array, _nr, _pos); \ - (_nr)++; \ - (_array)[(_pos)] = (_new_item); \ -} while (0) - -#define array_remove_items(_array, _nr, _pos, _nr_to_remove) \ -do { \ - (_nr) -= (_nr_to_remove); \ - memmove(&(_array)[(_pos)], \ - &(_array)[(_pos) + (_nr_to_remove)], \ - sizeof((_array)[0]) * ((_nr) - (_pos))); \ -} while (0) - -#define array_remove_item(_array, _nr, _pos) \ - array_remove_items(_array, _nr, _pos, 1) - static inline void __move_gap(void *array, size_t element_size, size_t nr, size_t size, size_t old_gap, size_t new_gap) diff --git a/fs/bcachefs/darray.h b/include/linux/darray.h similarity index 66% rename from fs/bcachefs/darray.h rename to include/linux/darray.h index 4b340d13caac..ff167eb795f2 100644 --- a/fs/bcachefs/darray.h +++ b/include/linux/darray.h @@ -1,34 +1,26 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _BCACHEFS_DARRAY_H -#define _BCACHEFS_DARRAY_H +/* + * (C) 2022-2024 Kent Overstreet + */ +#ifndef _LINUX_DARRAY_H +#define _LINUX_DARRAY_H /* - * Dynamic arrays: + * Dynamic arrays * * Inspired by CCAN's darray */ +#include #include -#define DARRAY_PREALLOCATED(_type, _nr) \ -struct { \ - size_t nr, size; \ - _type *data; \ - _type preallocated[_nr]; \ -} - -#define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0) - -typedef DARRAY(char) darray_char; -typedef DARRAY(char *) darray_str; - -int __bch2_darray_resize(darray_char *, size_t, size_t, gfp_t); +int __darray_resize_slowpath(darray_char *, size_t, size_t, gfp_t); static inline int __darray_resize(darray_char *d, size_t element_size, size_t new_size, gfp_t gfp) { return unlikely(new_size > d->size) - ? __bch2_darray_resize(d, element_size, new_size, gfp) + ? __darray_resize_slowpath(d, element_size, new_size, gfp) : 0; } @@ -69,6 +61,28 @@ static inline int __darray_make_room(darray_char *d, size_t t_size, size_t more, #define darray_first(_d) ((_d).data[0]) #define darray_last(_d) ((_d).data[(_d).nr - 1]) +/* Insert/remove items into the middle of a darray: */ + +#define array_insert_item(_array, _nr, _pos, _new_item) \ +do { \ + memmove(&(_array)[(_pos) + 1], \ + &(_array)[(_pos)], \ + sizeof((_array)[0]) * ((_nr) - (_pos))); \ + (_nr)++; \ + (_array)[(_pos)] = (_new_item); \ +} while (0) + +#define array_remove_items(_array, _nr, _pos, _nr_to_remove) \ +do { \ + (_nr) -= (_nr_to_remove); \ + memmove(&(_array)[(_pos)], \ + &(_array)[(_pos) + (_nr_to_remove)], \ + sizeof((_array)[0]) * ((_nr) - (_pos))); \ +} while (0) + +#define array_remove_item(_array, _nr, _pos) \ + array_remove_items(_array, _nr, _pos, 1) + #define darray_insert_item(_d, pos, _item) \ ({ \ size_t _pos = (pos); \ @@ -79,10 +93,15 @@ static inline int __darray_make_room(darray_char *d, size_t t_size, size_t more, _ret; \ }) +#define darray_remove_items(_d, _pos, _nr_to_remove) \ + array_remove_items((_d)->data, (_d)->nr, (_pos) - (_d)->data, _nr_to_remove) + #define darray_remove_item(_d, _pos) \ - array_remove_item((_d)->data, (_d)->nr, (_pos) - (_d)->data) + darray_remove_items(_d, _pos, 1) + +/* Iteration: */ -#define __darray_for_each(_d, _i) \ +#define __darray_for_each(_d, _i) \ for ((_i) = (_d).data; _i < (_d).data + (_d).nr; _i++) #define darray_for_each(_d, _i) \ @@ -106,4 +125,4 @@ do { \ darray_init(_d); \ } while (0) -#endif /* _BCACHEFS_DARRAY_H */ +#endif /* _LINUX_DARRAY_H */ diff --git a/include/linux/darray_types.h b/include/linux/darray_types.h new file mode 100644 index 000000000000..a400a0c3600d --- /dev/null +++ b/include/linux/darray_types.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * (C) 2022-2024 Kent Overstreet + */ +#ifndef _LINUX_DARRAY_TYpES_H +#define _LINUX_DARRAY_TYpES_H + +#include + +#define DARRAY_PREALLOCATED(_type, _nr) \ +struct { \ + size_t nr, size; \ + _type *data; \ + _type preallocated[_nr]; \ +} + +#define DARRAY(_type) DARRAY_PREALLOCATED(_type, 0) + +typedef DARRAY(char) darray_char; +typedef DARRAY(char *) darray_str; + +#endif /* _LINUX_DARRAY_TYpES_H */ diff --git a/lib/Makefile b/lib/Makefile index 3b1769045651..f540c84e8c08 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ - generic-radix-tree.o bitmap-str.o + generic-radix-tree.o bitmap-str.o darray.o obj-$(CONFIG_STRING_KUNIT_TEST) += string_kunit.o obj-y += string_helpers.o obj-$(CONFIG_STRING_HELPERS_KUNIT_TEST) += string_helpers_kunit.o diff --git a/fs/bcachefs/darray.c b/lib/darray.c similarity index 56% rename from fs/bcachefs/darray.c rename to lib/darray.c index ac35b8b705ae..7cb064f14b39 100644 --- a/fs/bcachefs/darray.c +++ b/lib/darray.c @@ -1,10 +1,14 @@ // SPDX-License-Identifier: GPL-2.0 +/* + * (C) 2022-2024 Kent Overstreet + */ +#include #include +#include #include -#include "darray.h" -int __bch2_darray_resize(darray_char *d, size_t element_size, size_t new_size, gfp_t gfp) +int __darray_resize_slowpath(darray_char *d, size_t element_size, size_t new_size, gfp_t gfp) { if (new_size > d->size) { new_size = roundup_pow_of_two(new_size); @@ -22,3 +26,7 @@ int __bch2_darray_resize(darray_char *d, size_t element_size, size_t new_size, g return 0; } +EXPORT_SYMBOL_GPL(__darray_resize_slowpath); + +MODULE_AUTHOR("Kent Overstreet"); +MODULE_LICENSE("GPL"); From patchwork Mon Jun 3 00:32:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 13683150 Received: from out-171.mta0.migadu.com (out-171.mta0.migadu.com [91.218.175.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8711B23A6 for ; Mon, 3 Jun 2024 00:33:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374803; cv=none; b=mbQ9XY7+AmIZEL2b6pROTBvsdjPEGzSqlVqLxKDaTBzbuu4AreZjLAXcwWbqiIl8oXLj+/iu8DAuCjPDjh5n+2YJrvYrmU3DlxPiuUUAL58Jm4zCD4GyrY2V/wqMShK0O2mOmffKzNq7DthxHiLq9jBo5wKnFXeky92MHrioH+8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374803; c=relaxed/simple; bh=v6FQDpTvBgPP6UY0yZcioa8V0uGC7CDfX5pgDoWI7zQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PTNyVYvVSPaFOwJ6jglIkwNF9PegtUD3NN2l9gClxYWQaBESRUWmqmnyJHsRrf+7xRbhBPBDuCBLUvIvuUe1/xlPI60xmTWW6H7L6TmrUMoTzsPyj9f69OBFrfAwJTAub7iD7pdxRO0misCLf33xrSirpl6D4Vx2juK5wFpwqeM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=cxshFyB9; arc=none smtp.client-ip=91.218.175.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="cxshFyB9" X-Envelope-To: linux-fsdevel@vger.kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717374798; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+9i05LKO2QA/lK7UY488kffvxZI78NDDMmynN4RHjDY=; b=cxshFyB9JKCAKNznr6OdQp2IxW5GvjHiPt+QwcO6E7Tff7SKkiTnAbxNBlh6rLc1BPyINF oAsbyEBatHM043qRZtq+a8/5fz3Bt4YMJ4266ftivXpjgOu+sVkinKDFD6wTXyCe+0IeR1 kS1/2wqUZqwufTFCro/A+IvTJdlst2M= X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: axboe@kernel.dk X-Envelope-To: kent.overstreet@linux.dev X-Envelope-To: brauner@kernel.org X-Envelope-To: viro@zeniv.linux.org.uk X-Envelope-To: bernd.schubert@fastmail.fm X-Envelope-To: linux-mm@kvack.org X-Envelope-To: josef@toxicpanda.com X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Kent Overstreet , brauner@kernel.org, viro@zeniv.linux.org.uk, Bernd Schubert , linux-mm@kvack.org, Josef Bacik Subject: [PATCH 2/5] darray: Fix darray_for_each_reverse() when darray is empty Date: Sun, 2 Jun 2024 20:32:59 -0400 Message-ID: <20240603003306.2030491-3-kent.overstreet@linux.dev> In-Reply-To: <20240603003306.2030491-1-kent.overstreet@linux.dev> References: <20240603003306.2030491-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Signed-off-by: Kent Overstreet --- include/linux/darray.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/darray.h b/include/linux/darray.h index ff167eb795f2..603d6762c29a 100644 --- a/include/linux/darray.h +++ b/include/linux/darray.h @@ -108,7 +108,7 @@ do { \ for (typeof(&(_d).data[0]) _i = (_d).data; _i < (_d).data + (_d).nr; _i++) #define darray_for_each_reverse(_d, _i) \ - for (typeof(&(_d).data[0]) _i = (_d).data + (_d).nr - 1; _i >= (_d).data; --_i) + for (typeof(&(_d).data[0]) _i = (_d).data + (_d).nr - 1; (_d).data && _i >= (_d).data; --_i) #define darray_init(_d) \ do { \ From patchwork Mon Jun 3 00:33:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 13683151 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DE34257D for ; Mon, 3 Jun 2024 00:33:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374804; cv=none; b=qlyxCGyqvrWPznebhKKnSllyCVCOz3tTNDEU0bS9oyav5WGVlYcQikWCnX7YtY0qXLGIuOY48PTJShFco0kq/9lVXvPcvvyrRWftE83DdT2FWRGmGcvgwx0OlfZFh48Cc7OgS6QL0ve5+LIc7dOzDGer1a18kwDRQAn8UqBExgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374804; c=relaxed/simple; bh=qSMC0gj3AkpxW0hSfyTZ9g/vn//fUelwGgiKeIwEBeY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BBS+o0stqgDuPkoTOUPbKkUr3N8mrZqIM5ikoaaeXUWW5Fd8c/OUHwb28tSXS8oN/Kf6j+Bw+t7EKRwas3kcJ8cje5V6LO/jNmftg/kn4Bz4pVSHfhngdL25e6kTWLL3ePFRYRIRdT5uEb2oHTpsn5tSnQWsY0NVrvyW7rXyWyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=OuNu0zEO; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="OuNu0zEO" X-Envelope-To: linux-fsdevel@vger.kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717374800; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H2TydfRRa//dCB6Galq6U6j5ZOjMw/czyhsIhaMSWj4=; b=OuNu0zEOyMeufrI16DKc/EKh/nPSLjcKCAc87NjipsX+6m6ufI/4Mx68CwMPpqwnl4BAGK NUadKWxp8+ZnCohqJnLta8pwL4mGdgWyPEXbNdTIJuvRV63wCAK51H9BI3b1ooVZBWSfxT FmwkK+XzYUm0tHZSMrDe3JjN5yFMBgg= X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: axboe@kernel.dk X-Envelope-To: kent.overstreet@linux.dev X-Envelope-To: brauner@kernel.org X-Envelope-To: viro@zeniv.linux.org.uk X-Envelope-To: bernd.schubert@fastmail.fm X-Envelope-To: linux-mm@kvack.org X-Envelope-To: josef@toxicpanda.com X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Kent Overstreet , brauner@kernel.org, viro@zeniv.linux.org.uk, Bernd Schubert , linux-mm@kvack.org, Josef Bacik Subject: [PATCH 3/5] fs: sys_ringbuffer Date: Sun, 2 Jun 2024 20:33:00 -0400 Message-ID: <20240603003306.2030491-4-kent.overstreet@linux.dev> In-Reply-To: <20240603003306.2030491-1-kent.overstreet@linux.dev> References: <20240603003306.2030491-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT Add new syscalls for generic ringbuffers that can be attached to arbitrary (supporting) file descriptors. A ringbuffer consists of: - a single page for head/tail pointers, size/mask, and other ancilliary metadata, described by 'struct ringbuffer_ptrs' - a data buffer, consisting of one or more pages mapped at 'ringbuffer_ptrs.data_offset' above the address of 'ringbuffer_ptrs' The data buffer is always a power of two size. Head and tail pointers are u32 byte offsets, and they are stored unmasked (i.e., they use the full 32 bit range) - they must be masked for reading. - ringbuffer(int fd, int rw, u32 size, ulong *addr) Create or get address of an existing ringbuffer for either reads or writes, of at least size bytes, and attach it to the given file descriptor; the address of the ringbuffer is returned via addr. Since files can be shared between processes in different address spaces a ringbuffer may be mapped into multiple address spaces via this syscall. - ringbuffer_wait(int fd, int rw) Wait for space to be availaable (on a ringbuffer for writing), or data to be available (on a ringbuffer for writing). todo: add parameters for timeout, minimum amount of data/space to wait for - ringbuffer_wakeup(int fd, int rw) Required after writing to a previously empty ringbuffer, or reading from a previously full ringbuffer to notify waiters on the other end todo - investigate integrating with futexes? todo - add extra fields to ringbuffer_ptrs for waiting on a minimum amount of data/space, i.e. to signal when a wakeup is required Kernel interfaces: - To indicate that ringbuffers are supported on a file, set FOP_RINGBUFFER_READ and/or FOP_RINGBUFFER_WRITE in your file_operations. - To read or write to a file's associated ringbuffers (file->f_ringbuffer), use ringbuffer_read() or ringbuffer_write(). Signed-off-by: Kent Overstreet --- arch/x86/entry/syscalls/syscall_32.tbl | 3 + arch/x86/entry/syscalls/syscall_64.tbl | 3 + fs/Makefile | 1 + fs/ringbuffer.c | 474 +++++++++++++++++++++++++ include/linux/fs.h | 2 + include/linux/mm_types.h | 4 + include/linux/ringbuffer_sys.h | 18 + include/uapi/linux/futex.h | 1 + include/uapi/linux/ringbuffer_sys.h | 40 +++ init/Kconfig | 9 + kernel/fork.c | 2 + 11 files changed, 557 insertions(+) create mode 100644 fs/ringbuffer.c create mode 100644 include/linux/ringbuffer_sys.h create mode 100644 include/uapi/linux/ringbuffer_sys.h diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 7fd1f57ad3d3..2385359eaf75 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -467,3 +467,6 @@ 460 i386 lsm_set_self_attr sys_lsm_set_self_attr 461 i386 lsm_list_modules sys_lsm_list_modules 462 i386 mseal sys_mseal +463 i386 ringbuffer sys_ringbuffer +464 i386 ringbuffer_wait sys_ringbuffer_wait +465 i386 ringbuffer_wakeup sys_ringbuffer_wakeup diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index a396f6e6ab5b..942602ece075 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -384,6 +384,9 @@ 460 common lsm_set_self_attr sys_lsm_set_self_attr 461 common lsm_list_modules sys_lsm_list_modules 462 common mseal sys_mseal +463 common ringbuffer sys_ringbuffer +464 common ringbuffer_wait sys_ringbuffer_wait +465 common ringbuffer_wakeup sys_ringbuffer_wakeup # # Due to a historical design error, certain syscalls are numbered differently diff --git a/fs/Makefile b/fs/Makefile index 6ecc9b0a53f2..48e54ac01fb1 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -28,6 +28,7 @@ obj-$(CONFIG_TIMERFD) += timerfd.o obj-$(CONFIG_EVENTFD) += eventfd.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o obj-$(CONFIG_AIO) += aio.o +obj-$(CONFIG_RINGBUFFER) += ringbuffer.o obj-$(CONFIG_FS_DAX) += dax.o obj-$(CONFIG_FS_ENCRYPTION) += crypto/ obj-$(CONFIG_FS_VERITY) += verity/ diff --git a/fs/ringbuffer.c b/fs/ringbuffer.c new file mode 100644 index 000000000000..82e042c1c89b --- /dev/null +++ b/fs/ringbuffer.c @@ -0,0 +1,474 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "%s() " fmt "\n", __func__ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define RINGBUFFER_FS_MAGIC 0xa10a10a2 + +static DEFINE_MUTEX(ringbuffer_lock); + +static struct vfsmount *ringbuffer_mnt; + +struct ringbuffer_mapping { + ulong addr; + struct mm_struct *mm; +}; + +struct ringbuffer { + u32 size; /* always a power of two */ + u32 mask; /* size - 1 */ + unsigned order; + wait_queue_head_t wait[2]; + struct ringbuffer_ptrs *ptrs; + void *data; + /* hidden internal file for the mmap */ + struct file *rb_file; + DARRAY(struct ringbuffer_mapping) mms; +}; + +static const struct address_space_operations ringbuffer_aops = { + .dirty_folio = noop_dirty_folio, +#if 0 + .migrate_folio = ringbuffer_migrate_folio, +#endif +}; + +#if 0 +static int ringbuffer_mremap(struct vm_area_struct *vma) +{ + struct file *file = vma->vm_file; + struct mm_struct *mm = vma->vm_mm; + struct kioctx_table *table; + int i, res = -EINVAL; + + spin_lock(&mm->ioctx_lock); + rcu_read_lock(); + table = rcu_dereference(mm->ioctx_table); + if (!table) + goto out_unlock; + + for (i = 0; i < table->nr; i++) { + struct kioctx *ctx; + + ctx = rcu_dereference(table->table[i]); + if (ctx && ctx->ringbuffer_file == file) { + if (!atomic_read(&ctx->dead)) { + ctx->user_id = ctx->mmap_base = vma->vm_start; + res = 0; + } + break; + } + } + +out_unlock: + rcu_read_unlock(); + spin_unlock(&mm->ioctx_lock); + return res; +} +#endif + +static const struct vm_operations_struct ringbuffer_vm_ops = { +#if 0 + .mremap = ringbuffer_mremap, +#endif +#if IS_ENABLED(CONFIG_MMU) + .fault = filemap_fault, + .map_pages = filemap_map_pages, + .page_mkwrite = filemap_page_mkwrite, +#endif +}; + +static int ringbuffer_mmap(struct file *file, struct vm_area_struct *vma) +{ + vm_flags_set(vma, VM_DONTEXPAND); + vma->vm_ops = &ringbuffer_vm_ops; + return 0; +} + +static const struct file_operations ringbuffer_fops = { + .mmap = ringbuffer_mmap, +}; + +void ringbuffer_free(struct ringbuffer *rb) +{ + pr_debug("%px", rb); + + lockdep_assert_held(&ringbuffer_lock); + + darray_for_each(rb->mms, map) + darray_for_each_reverse(map->mm->ringbuffers, rb2) + if (rb == *rb2) + darray_remove_item(&map->mm->ringbuffers, rb2); + + if (rb->rb_file) { + /* Kills mapping: */ + truncate_setsize(file_inode(rb->rb_file), 0); + + struct address_space *mapping = rb->rb_file->f_mapping; + spin_lock(&mapping->i_private_lock); + mapping->i_private_data = NULL; + spin_unlock(&mapping->i_private_lock); + + fput(rb->rb_file); + } + + free_pages((ulong) rb->data, get_order(rb->size)); + free_page((ulong) rb->ptrs); + kfree(rb); +} + +static int ringbuffer_alloc_inode(struct ringbuffer *rb) +{ + struct inode *inode = alloc_anon_inode(ringbuffer_mnt->mnt_sb); + int ret = PTR_ERR_OR_ZERO(inode); + if (ret) + goto err; + + inode->i_mapping->a_ops = &ringbuffer_aops; + inode->i_mapping->i_private_data = rb; + inode->i_size = rb->size * 2; + mapping_set_large_folios(inode->i_mapping); + + rb->rb_file = alloc_file_pseudo(inode, ringbuffer_mnt, "[ringbuffer]", + O_RDWR, &ringbuffer_fops); + ret = PTR_ERR_OR_ZERO(rb->rb_file); + if (ret) + goto err_iput; + + struct folio *f_ptrs = page_folio(virt_to_page(rb->ptrs)); + struct folio *f_data = page_folio(virt_to_page(rb->data)); + + __folio_set_locked(f_ptrs); + __folio_mark_uptodate(f_ptrs); + + void *shadow = NULL; + ret = __filemap_add_folio(rb->rb_file->f_mapping, f_ptrs, + (1U << rb->order) - 1, GFP_KERNEL, &shadow); + if (ret) + goto err; + folio_unlock(f_ptrs); + + __folio_set_locked(f_data); + __folio_mark_uptodate(f_data); + shadow = NULL; + ret = __filemap_add_folio(rb->rb_file->f_mapping, f_data, + 1U << rb->order, GFP_KERNEL, &shadow); + if (ret) + goto err; + folio_unlock(f_data); + return 0; +err_iput: + iput(inode); + return ret; +err: + truncate_setsize(file_inode(rb->rb_file), 0); + fput(rb->rb_file); + return ret; +} + +static int ringbuffer_map(struct ringbuffer *rb, ulong *addr) +{ + struct mm_struct *mm = current->mm; + int ret = 0; + + lockdep_assert_held(&ringbuffer_lock); + + if (!rb->rb_file) { + ret = ringbuffer_alloc_inode(rb); + if (ret) + return ret; + } + + ret = darray_make_room(&rb->mms, 1) ?: + darray_make_room(&mm->ringbuffers, 1); + if (ret) + return ret; + + ret = mmap_write_lock_killable(mm); + if (ret) + return ret; + + ulong unused; + struct ringbuffer_mapping map = { + .addr = do_mmap(rb->rb_file, 0, rb->size + PAGE_SIZE, + PROT_READ|PROT_WRITE, + MAP_SHARED, 0, + (1U << rb->order) - 1, + &unused, NULL), + .mm = mm, + }; + mmap_write_unlock(mm); + + ret = PTR_ERR_OR_ZERO((void *) map.addr); + if (ret) + return ret; + + ret = darray_push(&mm->ringbuffers, rb) ?: + darray_push(&rb->mms, map); + BUG_ON(ret); /* we preallocated */ + + *addr = map.addr; + return 0; +} + +static int ringbuffer_get_addr_or_map(struct ringbuffer *rb, ulong *addr) +{ + lockdep_assert_held(&ringbuffer_lock); + + struct mm_struct *mm = current->mm; + + darray_for_each(rb->mms, map) + if (map->mm == mm) { + *addr = map->addr; + return 0; + } + + return ringbuffer_map(rb, addr); +} + +struct ringbuffer *ringbuffer_alloc(u32 size) +{ + unsigned order = get_order(size); + size = PAGE_SIZE << order; + + struct ringbuffer *rb = kzalloc(sizeof(*rb), GFP_KERNEL); + if (!rb) + return ERR_PTR(-ENOMEM); + + rb->size = size; + rb->mask = size - 1; + rb->order = order; + init_waitqueue_head(&rb->wait[READ]); + init_waitqueue_head(&rb->wait[WRITE]); + + rb->ptrs = (void *) __get_free_page(GFP_KERNEL|__GFP_ZERO); + rb->data = (void *) __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_COMP, order); + if (!rb->ptrs || !rb->data) { + ringbuffer_free(rb); + return ERR_PTR(-ENOMEM); + } + + /* todo - implement a fallback when high order allocation fails */ + + rb->ptrs->size = size; + rb->ptrs->mask = size - 1; + rb->ptrs->data_offset = PAGE_SIZE; + return rb; +} + +/* + * XXX: we require synchronization when killing a ringbuffer (because no longer + * mapped anywhere) to a file that is still open (and in use) + */ +static void ringbuffer_mm_drop(struct mm_struct *mm, struct ringbuffer *rb) +{ + darray_for_each_reverse(rb->mms, map) + if (mm == map->mm) { + pr_debug("removing %px from %px", rb, mm); + darray_remove_item(&rb->mms, map); + } +} + +void ringbuffer_mm_exit(struct mm_struct *mm) +{ + mutex_lock(&ringbuffer_lock); + darray_for_each_reverse(mm->ringbuffers, rb) + ringbuffer_mm_drop(mm, *rb); + mutex_unlock(&ringbuffer_lock); + + darray_exit(&mm->ringbuffers); +} + +SYSCALL_DEFINE4(ringbuffer, unsigned, fd, int, rw, u32, size, ulong __user *, ringbufferp) +{ + ulong rb_addr; + + int ret = get_user(rb_addr, ringbufferp); + if (unlikely(ret)) + return ret; + + if (unlikely(rb_addr || !size || rw > WRITE)) + return -EINVAL; + + struct fd f = fdget(fd); + if (!f.file) + return -EBADF; + + struct ringbuffer *rb = f.file->f_op->ringbuffer(f.file, rw); + if (!rb) { + ret = -EOPNOTSUPP; + goto err; + } + + mutex_lock(&ringbuffer_lock); + ret = ringbuffer_get_addr_or_map(rb, &rb_addr); + if (ret) + goto err_unlock; + + ret = put_user(rb_addr, ringbufferp); +err_unlock: + mutex_unlock(&ringbuffer_lock); +err: + fdput(f); + return ret; +} + +ssize_t ringbuffer_read_iter(struct ringbuffer *rb, struct iov_iter *iter, bool nonblocking) +{ + u32 tail = rb->ptrs->tail, orig_tail = tail; + u32 head = smp_load_acquire(&rb->ptrs->head); + + if (unlikely(head == tail)) { + if (nonblocking) + return -EAGAIN; + int ret = wait_event_interruptible(rb->wait[READ], + (head = smp_load_acquire(&rb->ptrs->head)) != rb->ptrs->tail); + if (ret) + return ret; + } + + while (iov_iter_count(iter)) { + u32 tail_masked = tail & rb->mask; + u32 len = min(iov_iter_count(iter), + min(head - tail, + rb->size - tail_masked)); + if (!len) + break; + + len = copy_to_iter(rb->data + tail_masked, len, iter); + + tail += len; + } + + smp_store_release(&rb->ptrs->tail, tail); + + smp_mb(); + + if (rb->ptrs->head - orig_tail >= rb->size) + wake_up(&rb->wait[WRITE]); + + return tail - orig_tail; +} +EXPORT_SYMBOL_GPL(ringbuffer_read_iter); + +ssize_t ringbuffer_write_iter(struct ringbuffer *rb, struct iov_iter *iter, bool nonblocking) +{ + u32 head = rb->ptrs->head, orig_head = head; + u32 tail = smp_load_acquire(&rb->ptrs->tail); + + if (unlikely(head - tail >= rb->size)) { + if (nonblocking) + return -EAGAIN; + int ret = wait_event_interruptible(rb->wait[WRITE], + head - (tail = smp_load_acquire(&rb->ptrs->tail)) < rb->size); + if (ret) + return ret; + } + + while (iov_iter_count(iter)) { + u32 head_masked = head & rb->mask; + u32 len = min(iov_iter_count(iter), + min(tail + rb->size - head, + rb->size - head_masked)); + if (!len) + break; + + len = copy_from_iter(rb->data + head_masked, len, iter); + + head += len; + } + + smp_store_release(&rb->ptrs->head, head); + + smp_mb(); + + if ((s32) (rb->ptrs->tail - orig_head) >= 0) + wake_up(&rb->wait[READ]); + + return head - orig_head; +} +EXPORT_SYMBOL_GPL(ringbuffer_write_iter); + +SYSCALL_DEFINE2(ringbuffer_wait, unsigned, fd, int, rw) +{ + int ret = 0; + + if (rw > WRITE) + return -EINVAL; + + struct fd f = fdget(fd); + if (!f.file) + return -EBADF; + + struct ringbuffer *rb = f.file->f_op->ringbuffer(f.file, rw); + if (!rb) { + ret = -EINVAL; + goto err; + } + + struct ringbuffer_ptrs *rp = rb->ptrs; + wait_event(rb->wait[rw], rw == READ + ? rp->head != rp->tail + : rp->head - rp->tail < rb->size); +err: + fdput(f); + return ret; +} + +SYSCALL_DEFINE2(ringbuffer_wakeup, unsigned, fd, int, rw) +{ + int ret = 0; + + if (rw > WRITE) + return -EINVAL; + + struct fd f = fdget(fd); + if (!f.file) + return -EBADF; + + struct ringbuffer *rb = f.file->f_op->ringbuffer(f.file, rw); + if (!rb) { + ret = -EINVAL; + goto err; + } + + wake_up(&rb->wait[!rw]); +err: + fdput(f); + return ret; +} + +static int ringbuffer_init_fs_context(struct fs_context *fc) +{ + if (!init_pseudo(fc, RINGBUFFER_FS_MAGIC)) + return -ENOMEM; + fc->s_iflags |= SB_I_NOEXEC; + return 0; +} + +static int __init ringbuffer_init(void) +{ + static struct file_system_type ringbuffer_fs = { + .name = "ringbuffer", + .init_fs_context = ringbuffer_init_fs_context, + .kill_sb = kill_anon_super, + }; + ringbuffer_mnt = kern_mount(&ringbuffer_fs); + if (IS_ERR(ringbuffer_mnt)) + panic("Failed to create ringbuffer fs mount."); + return 0; +} +__initcall(ringbuffer_init); diff --git a/include/linux/fs.h b/include/linux/fs.h index 0283cf366c2a..3026f8f92d6f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1996,6 +1996,7 @@ struct offset_ctx; typedef unsigned int __bitwise fop_flags_t; +struct ringbuffer; struct file_operations { struct module *owner; fop_flags_t fop_flags; @@ -2004,6 +2005,7 @@ struct file_operations { ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *); ssize_t (*read_iter) (struct kiocb *, struct iov_iter *); ssize_t (*write_iter) (struct kiocb *, struct iov_iter *); + struct ringbuffer *(*ringbuffer)(struct file *, int); int (*iopoll)(struct kiocb *kiocb, struct io_comp_batch *, unsigned int flags); int (*iterate_shared) (struct file *, struct dir_context *); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 24323c7d0bd4..6e412718ce7e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -5,6 +5,7 @@ #include #include +#include #include #include #include @@ -911,6 +912,9 @@ struct mm_struct { spinlock_t ioctx_lock; struct kioctx_table __rcu *ioctx_table; #endif +#ifdef CONFIG_RINGBUFFER + DARRAY(struct ringbuffer *) ringbuffers; +#endif #ifdef CONFIG_MEMCG /* * "owner" points to a task that is regarded as the canonical diff --git a/include/linux/ringbuffer_sys.h b/include/linux/ringbuffer_sys.h new file mode 100644 index 000000000000..843509f72514 --- /dev/null +++ b/include/linux/ringbuffer_sys.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_RINGBUFFER_SYS_H +#define _LINUX_RINGBUFFER_SYS_H + +#include +#include +#include + +struct mm_struct; +void ringbuffer_mm_exit(struct mm_struct *mm); + +void ringbuffer_free(struct ringbuffer *rb); +struct ringbuffer *ringbuffer_alloc(u32 size); + +ssize_t ringbuffer_read_iter(struct ringbuffer *rb, struct iov_iter *iter, bool nonblock); +ssize_t ringbuffer_write_iter(struct ringbuffer *rb, struct iov_iter *iter, bool nonblock); + +#endif /* _LINUX_RINGBUFFER_SYS_H */ diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h index d2ee625ea189..09d94a5cb849 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -22,6 +22,7 @@ #define FUTEX_WAIT_REQUEUE_PI 11 #define FUTEX_CMP_REQUEUE_PI 12 #define FUTEX_LOCK_PI2 13 +#define FUTEX_WAIT_GE 14 #define FUTEX_PRIVATE_FLAG 128 #define FUTEX_CLOCK_REALTIME 256 diff --git a/include/uapi/linux/ringbuffer_sys.h b/include/uapi/linux/ringbuffer_sys.h new file mode 100644 index 000000000000..a7afe8647cc1 --- /dev/null +++ b/include/uapi/linux/ringbuffer_sys.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_RINGBUFFER_SYS_H +#define _UAPI_LINUX_RINGBUFFER_SYS_H + +#include + +/* + * ringbuffer_ptrs - head and tail pointers for a ringbuffer, mappped to + * userspace: + */ +struct ringbuffer_ptrs { + /* + * We use u32s because this type is shared between the kernel and + * userspace - ulong/size_t won't work here, we might be 32bit userland + * and 64 bit kernel, and u64 would be preferable (reduced probability + * of ABA) but not all architectures can atomically read/write to a u64; + * we need to avoid torn reads/writes. + * + * head and tail pointers are incremented and stored without masking; + * this is to avoid ABA and differentiate between a full and empty + * buffer - they must be masked with @mask to get an actual offset into + * the data buffer. + * + * All units are in bytes. + * + * Data is emitted at head, consumed from tail. + */ + __u32 head; + __u32 tail; + __u32 size; /* always a power of two */ + __u32 mask; /* size - 1 */ + + /* + * Starting offset of data buffer, from the start of this struct - will + * always be PAGE_SIZE. + */ + __u32 data_offset; +}; + +#endif /* _UAPI_LINUX_RINGBUFFER_SYS_H */ diff --git a/init/Kconfig b/init/Kconfig index 72404c1f2157..c43d536d4898 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1673,6 +1673,15 @@ config IO_URING applications to submit and complete IO through submission and completion rings that are shared between the kernel and application. +config RINGBUFFER + bool "Enable ringbuffer() syscall" if EXPERT + select XARRAY_MULTI + default y + help + This option adds support for generic ringbuffers, which can be + attached to any (supported) file descriptor, allowing for reading and + writing without syscall overhead. + config ADVISE_SYSCALLS bool "Enable madvise/fadvise syscalls" if EXPERT default y diff --git a/kernel/fork.c b/kernel/fork.c index 99076dbe27d8..9190a06a6365 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -103,6 +103,7 @@ #include #include #include +#include #include #include @@ -1340,6 +1341,7 @@ static inline void __mmput(struct mm_struct *mm) VM_BUG_ON(atomic_read(&mm->mm_users)); uprobe_clear_state(mm); + ringbuffer_mm_exit(mm); exit_aio(mm); ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ From patchwork Mon Jun 3 00:33:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 13683152 Received: from out-170.mta0.migadu.com (out-170.mta0.migadu.com [91.218.175.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A2D56FB2 for ; Mon, 3 Jun 2024 00:33:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374805; cv=none; b=D4dIa91QEujNLjK10en6FpuBWG6b0D5Ml6oVX6nyTooAm2busbRayh5I0dOKFg7YypPmwnvqS1hOo+noJ3rgf2NeFmCYApy8laJQvsYpxa3x9PFGGMn5wJPsJPt+3gnW7Ad5GFdrJ+0KW6GqWQLD4prKTNmn1F4FUKGXK20fdPI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374805; c=relaxed/simple; bh=MbreANTvOFsf+HoMZMv881QbCHhV7LHQF9AJXemvdRM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ofpSJxJZBTZnQKXj4goBmTo96BKYvxATGafvS/J7MTulUReLsHiAoy5P5ZOhA4y/GfK3JiicyzR3STSztULorOFtbMnr3AqAMvfuazdqDH/CledTomaz/8KC1OZzAIso+T4+pHgzKexTX+7CslEp+DCBtOCx0btazFazDe5IdZ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=h9/irGBP; arc=none smtp.client-ip=91.218.175.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="h9/irGBP" X-Envelope-To: linux-fsdevel@vger.kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717374801; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qtDVS9A12/FhM6gLI1GwHXngLSArqAMBUGSjtT60l7Y=; b=h9/irGBPZpS9RHEtUgiTJNJaSvRPgN6nJTrpu0j6bdb35L0d1exKDknSJ8k4wJbNc2SO/S d4jzM7YVLiQa1bS20ok3j4tjM0kEsbo2oWdezuBzLqXltHouvdgqN82Pjvqo6orf1MOT+1 37IexeJNG/zfZ4kQykqf74E8mRdY+bw= X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: axboe@kernel.dk X-Envelope-To: kent.overstreet@linux.dev X-Envelope-To: brauner@kernel.org X-Envelope-To: viro@zeniv.linux.org.uk X-Envelope-To: bernd.schubert@fastmail.fm X-Envelope-To: linux-mm@kvack.org X-Envelope-To: josef@toxicpanda.com X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Kent Overstreet , brauner@kernel.org, viro@zeniv.linux.org.uk, Bernd Schubert , linux-mm@kvack.org, Josef Bacik Subject: [PATCH 4/5] ringbuffer: Test device Date: Sun, 2 Jun 2024 20:33:01 -0400 Message-ID: <20240603003306.2030491-5-kent.overstreet@linux.dev> In-Reply-To: <20240603003306.2030491-1-kent.overstreet@linux.dev> References: <20240603003306.2030491-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT This adds /dev/ringbuffer-test, which supports reading and writing a sequence of integers, to test performance and correctness. Signed-off-by: Kent Overstreet --- fs/Makefile | 1 + fs/ringbuffer_test.c | 209 +++++++++++++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 5 ++ 3 files changed, 215 insertions(+) create mode 100644 fs/ringbuffer_test.c diff --git a/fs/Makefile b/fs/Makefile index 48e54ac01fb1..91061f281f0a 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_EVENTFD) += eventfd.o obj-$(CONFIG_USERFAULTFD) += userfaultfd.o obj-$(CONFIG_AIO) += aio.o obj-$(CONFIG_RINGBUFFER) += ringbuffer.o +obj-$(CONFIG_RINGBUFFER_TEST) += ringbuffer_test.o obj-$(CONFIG_FS_DAX) += dax.o obj-$(CONFIG_FS_ENCRYPTION) += crypto/ obj-$(CONFIG_FS_VERITY) += verity/ diff --git a/fs/ringbuffer_test.c b/fs/ringbuffer_test.c new file mode 100644 index 000000000000..01aa9c55120d --- /dev/null +++ b/fs/ringbuffer_test.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define pr_fmt(fmt) "%s() " fmt "\n", __func__ + +#include +#include +#include +#include +#include +#include + +struct ringbuffer_test_file { + struct ringbuffer_test_rw { + struct mutex lock; + struct ringbuffer *rb; + struct task_struct *thr; + } rw[2]; +}; + +#define BUF_NR 4 + +static int ringbuffer_test_writer(void *p) +{ + struct file *file = p; + struct ringbuffer_test_file *f = file->private_data; + struct ringbuffer *rb = f->rw[READ].rb; + u32 idx = 0; + u32 buf[BUF_NR]; + + while (!kthread_should_stop()) { + cond_resched(); + + struct kvec vec = { buf, sizeof(buf) }; + struct iov_iter iter; + iov_iter_kvec(&iter, ITER_SOURCE, &vec, 1, sizeof(buf)); + + for (unsigned i = 0; i < ARRAY_SIZE(buf); i++) + buf[i] = idx + i; + + ssize_t ret = ringbuffer_write_iter(rb, &iter, false); + if (ret < 0) + continue; + idx += ret / sizeof(buf[0]); + } + + return 0; +} + +static int ringbuffer_test_reader(void *p) +{ + struct file *file = p; + struct ringbuffer_test_file *f = file->private_data; + struct ringbuffer *rb = f->rw[WRITE].rb; + u32 idx = 0; + u32 buf[BUF_NR]; + + while (!kthread_should_stop()) { + cond_resched(); + + struct kvec vec = { buf, sizeof(buf) }; + struct iov_iter iter; + iov_iter_kvec(&iter, ITER_DEST, &vec, 1, sizeof(buf)); + + ssize_t ret = ringbuffer_read_iter(rb, &iter, false); + if (ret < 0) + continue; + + unsigned nr = ret / sizeof(buf[0]); + for (unsigned i = 0; i < nr; i++) + if (buf[i] != idx + i) + pr_err("read wrong data"); + idx += ret / sizeof(buf[0]); + } + + return 0; +} + +static void ringbuffer_test_free(struct ringbuffer_test_file *f) +{ + for (unsigned i = 0; i < ARRAY_SIZE(f->rw); i++) + if (!IS_ERR_OR_NULL(f->rw[i].thr)) + kthread_stop_put(f->rw[i].thr); + for (unsigned i = 0; i < ARRAY_SIZE(f->rw); i++) + if (!IS_ERR_OR_NULL(f->rw[i].rb)) + ringbuffer_free(f->rw[i].rb); + kfree(f); +} + +static int ringbuffer_test_open(struct inode *inode, struct file *file) +{ + static const char * const rw_str[] = { "reader", "writer" }; + int ret = 0; + + struct ringbuffer_test_file *f = kzalloc(sizeof(*f), GFP_KERNEL); + if (!f) + return -ENOMEM; + + for (struct ringbuffer_test_rw *i = f->rw; + i < f->rw + ARRAY_SIZE(f->rw); + i++) { + unsigned idx = i - f->rw; + + mutex_init(&i->lock); + + i->rb = ringbuffer_alloc(PAGE_SIZE * 4); + ret = PTR_ERR_OR_ZERO(i->rb); + if (ret) + goto err; + + i->thr = kthread_create(idx == READ + ? ringbuffer_test_reader + : ringbuffer_test_writer, + file, "ringbuffer_%s", rw_str[idx]); + ret = PTR_ERR_OR_ZERO(i->thr); + if (ret) + goto err; + get_task_struct(i->thr); + } + + file->private_data = f; + wake_up_process(f->rw[0].thr); + wake_up_process(f->rw[1].thr); + return 0; +err: + ringbuffer_test_free(f); + return ret; +} + +static int ringbuffer_test_release(struct inode *inode, struct file *file) +{ + ringbuffer_test_free(file->private_data); + return 0; +} + +static ssize_t ringbuffer_test_read_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + struct file *file = iocb->ki_filp; + struct ringbuffer_test_file *f = file->private_data; + struct ringbuffer_test_rw *i = &f->rw[READ]; + + ssize_t ret = mutex_lock_interruptible(&i->lock); + if (ret) + return ret; + + ret = ringbuffer_read_iter(i->rb, iter, file->f_flags & O_NONBLOCK); + mutex_unlock(&i->lock); + return ret; +} + +static ssize_t ringbuffer_test_write_iter(struct kiocb *iocb, struct iov_iter *iter) +{ + struct file *file = iocb->ki_filp; + struct ringbuffer_test_file *f = file->private_data; + struct ringbuffer_test_rw *i = &f->rw[WRITE]; + + ssize_t ret = mutex_lock_interruptible(&i->lock); + if (ret) + return ret; + + ret = ringbuffer_write_iter(i->rb, iter, file->f_flags & O_NONBLOCK); + mutex_unlock(&i->lock); + return ret; +} + +static struct ringbuffer *ringbuffer_test_ringbuffer(struct file *file, int rw) +{ + struct ringbuffer_test_file *i = file->private_data; + + BUG_ON(rw > WRITE); + + return i->rw[rw].rb; +} + +static const struct file_operations ringbuffer_fops = { + .owner = THIS_MODULE, + .read_iter = ringbuffer_test_read_iter, + .write_iter = ringbuffer_test_write_iter, + .ringbuffer = ringbuffer_test_ringbuffer, + .open = ringbuffer_test_open, + .release = ringbuffer_test_release, +}; + +static int __init ringbuffer_test_init(void) +{ + int ringbuffer_major = register_chrdev(0, "ringbuffer-test", &ringbuffer_fops); + if (ringbuffer_major < 0) + return ringbuffer_major; + + static const struct class ringbuffer_class = { .name = "ringbuffer_test" }; + int ret = class_register(&ringbuffer_class); + if (ret) + goto major_out; + + struct device *ringbuffer_device = device_create(&ringbuffer_class, NULL, + MKDEV(ringbuffer_major, 0), + NULL, "ringbuffer-test"); + ret = PTR_ERR_OR_ZERO(ringbuffer_device); + if (ret) + goto class_out; + + return 0; + +class_out: + class_unregister(&ringbuffer_class); +major_out: + unregister_chrdev(ringbuffer_major, "ringbuffer-test"); + return ret; +} +__initcall(ringbuffer_test_init); diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 59b6765d86b8..bb16762af575 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2957,6 +2957,11 @@ config TEST_OBJPOOL If unsure, say N. +config RINGBUFFER_TEST + bool "Test driver for sys_ringbuffer" + default n + depends on RINGBUFFER + endif # RUNTIME_TESTING_MENU config ARCH_USE_MEMTEST From patchwork Mon Jun 3 00:33:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kent Overstreet X-Patchwork-Id: 13683153 Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1810AAD5F for ; Mon, 3 Jun 2024 00:33:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.186 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374806; cv=none; b=XzDX9pjIA3LpZG/ALRSWTrfRNyK8m6QKhZamDHGkdHDq/1hOTOlcnz0SyTZS/EnAarwJKfVdnvKlqrHPVzzKuQiiX6Jmg2nI71V+y1sc/EwShI7vfGCcimhZ7SKhmk197pdKOuCauIvT1ntA9zUg/+eLMjJq/uCBEFZfsn1v2U0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717374806; c=relaxed/simple; bh=kEKiOFdh5hpyo/igNooYbtGOy+7574G9SztbHRdwO4c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IviYWpDJiSRzmizcUIrx2I7KkKGr3fXFuoQQR27XKly/cnFT7E1z4C5Aag7NLQfK3ufKSnOQr9Cj7+yzOK9N+r/VZpluhtqbG+JAhtbXe4FsW35P2SypSeHk0Uk023jVU27bfgi+eDejmWrWnmDpCGIgRHBfWHmAdK24OqvpI1k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=vMHc0xIm; arc=none smtp.client-ip=91.218.175.186 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="vMHc0xIm" X-Envelope-To: linux-fsdevel@vger.kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1717374802; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lid2ntMsw00JkkyCn7zuRGFLtASs70M1/6euqfBnjHA=; b=vMHc0xImlmkfSN8LhsrQzghIWopuZAElfRf+f441af22PU9ZUB9dOBzOPR/W52K54OE5lJ ulGnaHSIEheKlAZ35eLPPSJucr1Gk1tN0GHJw1pOIN7gpiBlfWtWSF1YYoyX4vIn5gBLDX VhFiQT9b5tZ2sh/SwGXNGNM3J6EAkR4= X-Envelope-To: linux-kernel@vger.kernel.org X-Envelope-To: axboe@kernel.dk X-Envelope-To: kent.overstreet@linux.dev X-Envelope-To: brauner@kernel.org X-Envelope-To: viro@zeniv.linux.org.uk X-Envelope-To: bernd.schubert@fastmail.fm X-Envelope-To: linux-mm@kvack.org X-Envelope-To: josef@toxicpanda.com X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jens Axboe , Kent Overstreet , brauner@kernel.org, viro@zeniv.linux.org.uk, Bernd Schubert , linux-mm@kvack.org, Josef Bacik Subject: [PATCH 5/5] ringbuffer: Userspace test helper Date: Sun, 2 Jun 2024 20:33:02 -0400 Message-ID: <20240603003306.2030491-6-kent.overstreet@linux.dev> In-Reply-To: <20240603003306.2030491-1-kent.overstreet@linux.dev> References: <20240603003306.2030491-1-kent.overstreet@linux.dev> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT This adds a helper for testing the new ringbuffer syscall using /dev/ringbuffer-test; it can do performance testing of both normal reads and writes, and reads and writes via the ringbuffer interface. Signed-off-by: Kent Overstreet --- tools/ringbuffer/Makefile | 3 + tools/ringbuffer/ringbuffer-test.c | 254 +++++++++++++++++++++++++++++ 2 files changed, 257 insertions(+) create mode 100644 tools/ringbuffer/Makefile create mode 100644 tools/ringbuffer/ringbuffer-test.c diff --git a/tools/ringbuffer/Makefile b/tools/ringbuffer/Makefile new file mode 100644 index 000000000000..2fb27a19b43e --- /dev/null +++ b/tools/ringbuffer/Makefile @@ -0,0 +1,3 @@ +CFLAGS=-g -O2 -Wall -Werror -I../../include + +all: ringbuffer-test diff --git a/tools/ringbuffer/ringbuffer-test.c b/tools/ringbuffer/ringbuffer-test.c new file mode 100644 index 000000000000..0fba99e40858 --- /dev/null +++ b/tools/ringbuffer/ringbuffer-test.c @@ -0,0 +1,254 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define READ 0 +#define WRITE 1 + +#define min(a, b) (a < b ? a : b) + +#define __EXPORTED_HEADERS__ +#include + +#define BUF_NR 4 + +typedef uint32_t u32; +typedef unsigned long ulong; + +static inline struct ringbuffer_ptrs *ringbuffer(int fd, int rw, u32 size) +{ + ulong addr = 0; + int ret = syscall(463, fd, rw, size, &addr); + if (ret < 0) + errno = -ret; + return (void *) addr; +} + +static inline int ringbuffer_wait(int fd, int rw) +{ + return syscall(464, fd, rw); +} + +static inline int ringbuffer_wakeup(int fd, int rw) +{ + return syscall(465, fd, rw); +} + +static ssize_t ringbuffer_read(int fd, struct ringbuffer_ptrs *rb, + void *buf, size_t len) +{ + void *rb_data = (void *) rb + rb->data_offset; + + u32 head, orig_tail = rb->tail, tail = orig_tail; + + while ((head = __atomic_load_n(&rb->head, __ATOMIC_ACQUIRE)) == tail) + ringbuffer_wait(fd, READ); + + while (len && head != tail) { + u32 tail_masked = tail & rb->mask; + unsigned b = min(len, + min(head - tail, + rb->size - tail_masked)); + + memcpy(buf, rb_data + tail_masked, b); + buf += b; + len -= b; + tail += b; + } + + __atomic_store_n(&rb->tail, tail, __ATOMIC_RELEASE); + + __atomic_thread_fence(__ATOMIC_SEQ_CST); + + if (rb->head - orig_tail >= rb->size) + ringbuffer_wakeup(fd, READ); + + return tail - orig_tail; +} + +static ssize_t ringbuffer_write(int fd, struct ringbuffer_ptrs *rb, + void *buf, size_t len) +{ + void *rb_data = (void *) rb + rb->data_offset; + + u32 orig_head = rb->head, head = orig_head, tail; + + while (head - (tail = __atomic_load_n(&rb->tail, __ATOMIC_ACQUIRE)) >= rb->size) + ringbuffer_wait(fd, WRITE); + + while (len && head - tail < rb->size) { + u32 head_masked = head & rb->mask; + unsigned b = min(len, + min(tail - head + rb->size, + rb->size - head_masked)); + + memcpy(rb_data + head_masked, buf, b); + buf += b; + len -= b; + head += b; + } + + __atomic_store_n(&rb->head, head, __ATOMIC_RELEASE); + + __atomic_thread_fence(__ATOMIC_SEQ_CST); + + if ((s32) (rb->tail - orig_head) >= 0) + ringbuffer_wakeup(fd, WRITE); + + return head - orig_head; +} + +static void usage(void) +{ + puts("ringbuffer-test - test ringbuffer syscall\n" + "Usage: ringbuffer-test [OPTION]...\n" + "\n" + "Options:\n" + " --type=(io|ringbuffer)\n" + " --rw=(read|write)\n" + " -h, --help Display this help and exit\n"); +} + +static inline ssize_t rb_test_read(int fd, struct ringbuffer_ptrs *rb, + void *buf, size_t len) +{ + return rb + ? ringbuffer_read(fd, rb, buf, len) + : read(fd, buf, len); +} + +static inline ssize_t rb_test_write(int fd, struct ringbuffer_ptrs *rb, + void *buf, size_t len) +{ + return rb + ? ringbuffer_write(fd, rb, buf, len) + : write(fd, buf, len); +} + +int main(int argc, char *argv[]) +{ + const struct option longopts[] = { + { "type", required_argument, NULL, 't' }, + { "rw", required_argument, NULL, 'r' }, + { "help", no_argument, NULL, 'h' }, + { NULL } + }; + int use_ringbuffer = false, rw = false; + int opt; + + while ((opt = getopt_long(argc, argv, "h", longopts, NULL)) != -1) { + switch (opt) { + case 't': + if (!strcmp(optarg, "io")) + use_ringbuffer = false; + else if (!strcmp(optarg, "ringbuffer") || + !strcmp(optarg, "rb")) + use_ringbuffer = true; + else { + fprintf(stderr, "Invalid type %s\n", optarg); + exit(EXIT_FAILURE); + } + break; + case 'r': + if (!strcmp(optarg, "read")) + rw = false; + else if (!strcmp(optarg, "write")) + rw = true; + else { + fprintf(stderr, "Invalid rw %s\n", optarg); + exit(EXIT_FAILURE); + } + break; + case '?': + fprintf(stderr, "Invalid option %c\n", opt); + usage(); + exit(EXIT_FAILURE); + case 'h': + usage(); + exit(EXIT_SUCCESS); + } + } + + int fd = open("/dev/ringbuffer-test", O_RDWR); + if (fd < 0) { + fprintf(stderr, "Error opening /dev/ringbuffer-test: %m\n"); + exit(EXIT_FAILURE); + } + + struct ringbuffer_ptrs *rb = NULL; + if (use_ringbuffer) { + rb = ringbuffer(fd, rw, 4096); + if (!rb) { + fprintf(stderr, "Error from sys_ringbuffer: %m\n"); + exit(EXIT_FAILURE); + } + + fprintf(stderr, "got ringbuffer %p\n", rb); + } + + printf("Starting test with ringbuffer=%u, rw=%u\n", use_ringbuffer, rw); + static const char * const rw_str[] = { "read", "wrote" }; + + struct timeval start; + gettimeofday(&start, NULL); + size_t nr_prints = 1; + + u32 buf[BUF_NR]; + u32 idx = 0; + + while (true) { + struct timeval now; + gettimeofday(&now, NULL); + + struct timeval next_print = start; + next_print.tv_sec += nr_prints; + + if (timercmp(&now, &next_print, >)) { + printf("%s %u u32s, %lu mb/sec\n", rw_str[rw], idx, + (idx * sizeof(u32) / (now.tv_sec - start.tv_sec)) / (1UL << 20)); + nr_prints++; + if (nr_prints > 20) + break; + } + + if (rw == READ) { + int r = rb_test_read(fd, rb, buf, sizeof(buf)); + if (r <= 0) { + fprintf(stderr, "Read returned %i (%m)\n", r); + exit(EXIT_FAILURE); + } + + unsigned nr = r / sizeof(u32); + for (unsigned i = 0; i < nr; i++) { + if (buf[i] != idx + i) { + fprintf(stderr, "Read returned wrong data at idx %u: got %u instead\n", + idx + i, buf[i]); + exit(EXIT_FAILURE); + } + } + + idx += nr; + } else { + for (unsigned i = 0; i < BUF_NR; i++) + buf[i] = idx + i; + + int r = rb_test_write(fd, rb, buf, sizeof(buf)); + if (r <= 0) { + fprintf(stderr, "Write returned %i (%m)\n", r); + exit(EXIT_FAILURE); + } + + unsigned nr = r / sizeof(u32); + idx += nr; + } + } + + exit(EXIT_SUCCESS); +}