diff mbox series

About one million subvols limit

Message ID 010201922582d9b7-d7ef099b-176f-4799-a54c-ff43cda585aa-000000@eu-west-1.amazonses.com (mailing list archive)
State New
Headers show
Series About one million subvols limit | expand

Commit Message

Martin Raiber Sept. 24, 2024, 7:29 p.m. UTC
Hi,

one btrfs user ran into a problem when creating a snapshot where the 
snapshot creation returns "Too many open files" (EMFILE).

I did some digging in this mailing list and saw a case where someone 
else had the same issue and it was diagnosed to a limitation to the 
number of anon bdevs which has max 2^20 (about one million) bdevs and it 
was fixed insofar that the limit was increased (3x?) and it wasn't 
remounting read-only in case of this occuring. Thanks for this!

The user had about one million total subvols (in different file 
systems), so it is probably the same issue.

It is problematic that this limitation exists. Did some further digging 
and found 
https://lore.kernel.org/linux-bcachefs/20240222154802.GA1219527@perftesting/ 
. Perhaps we can come up with an accelerated plan to increase the 
possible number of subvols? E.g. the behaviour could be switched over 
via a mount flag or feature bit?

Also attached a possible patch which would increase the max number of 
bdevs to 2^31, significantly improving the situation, but I'm 
insufficiently involved to tell if this might cause obvious problems.

I've also noticed that each subvol uses 2K of kernel memory, so 2^31 
subvols would use 4TiB of RAM -- so that would be the limitation for now 
(would be great if that can be improved as well, but that would be 
another topic).

Regards,

Martin Raiber
From 72afde28a2bf6656d921a3897555568b8e92eb13 Mon Sep 17 00:00:00 2001
From: Martin Raiber <martin@urbackup.org>
Date: Tue, 24 Sep 2024 21:19:00 +0200
Subject: [PATCH 1/1] Increase possible number of anon bdevs

Currently only max 2^20 anon bdevs can be allocated.
Increase this by also using the upper portion of the
major device number for anon bdevs (most upper
bit set). Since currently major devices seem to be
numbered < 512 this shouldn't cause issues.
---
 fs/super.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/fs/super.c b/fs/super.c
index 2d762ce67f6e..0c030bfc04da 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1257,26 +1257,36 @@  static DEFINE_IDA(unnamed_dev_ida);
 int get_anon_bdev(dev_t *p)
 {
 	int dev;
+	unsigned int dev_maj;
 
 	/*
 	 * Many userspace utilities consider an FSID of 0 invalid.
 	 * Always return at least 1 from get_anon_bdev.
 	 */
-	dev = ida_alloc_range(&unnamed_dev_ida, 1, (1 << MINORBITS) - 1,
+	dev = ida_alloc_range(&unnamed_dev_ida, 1, (1 << (MINORBITS + 11) ) - 1,
 			GFP_ATOMIC);
 	if (dev == -ENOSPC)
 		dev = -EMFILE;
 	if (dev < 0)
 		return dev;
 
-	*p = MKDEV(0, dev);
+	dev_maj = MAJOR(dev);
+	if (dev_maj==0)
+		*p = MKDEV(0, MINOR(dev));
+	else // Also use highest bit in MAJOR for anon devices
+		*p = MKDEV( 1U<<31 | dev_maj, MINOR(dev));
 	return 0;
 }
 EXPORT_SYMBOL(get_anon_bdev);
 
 void free_anon_bdev(dev_t dev)
 {
-	ida_free(&unnamed_dev_ida, MINOR(dev));
+	if (dev & (1U<<31))
+		dev &= ~(1U<<31);
+	else
+		dev = MINOR(dev);
+
+	ida_free(&unnamed_dev_ida, dev);
 }
 EXPORT_SYMBOL(free_anon_bdev);