diff mbox series

inotify: Increase default inotify.max_user_watches limit to 1048576

Message ID 20201026204418.23197-1-longman@redhat.com (mailing list archive)
State New, archived
Headers show
Series inotify: Increase default inotify.max_user_watches limit to 1048576 | expand

Commit Message

Waiman Long Oct. 26, 2020, 8:44 p.m. UTC
The default value of inotify.max_user_watches sysctl parameter was set
to 8192 since the introduction of the inotify feature in 2005 by
commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too
small for many modern usage. As a result, users have to explicitly set
it to a larger value to make it work.

After some searching around the web, these are the
inotify.max_user_watches values used by some projects:
 - vscode:  524288
 - dropbox support: 100000
 - users on stackexchange: 12228
 - lsyncd user: 2000000
 - code42 support: 1048576
 - monodevelop: 16384
 - tectonic: 524288
 - openshift origin: 65536

Each watch point adds an inotify_inode_mark structure to an inode to be
watched. Modeled after the epoll.max_user_watches behavior to adjust the
default value according to the amount of addressable memory available,
make inotify.max_user_watches behave in a similar way to make it use
no more than 1% of addressable memory within the range [8192, 1048576].

For 64-bit archs, inotify_inode_mark should have a size of 80 bytes. That
means a system with 8GB or more memory will have the maximum value of
1048576 for inotify.max_user_watches. This default should be big enough
for most of the use cases.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 fs/notify/inotify/inotify_user.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Amir Goldstein Oct. 27, 2020, 8:19 a.m. UTC | #1
On Mon, Oct 26, 2020 at 10:44 PM Waiman Long <longman@redhat.com> wrote:
>
> The default value of inotify.max_user_watches sysctl parameter was set
> to 8192 since the introduction of the inotify feature in 2005 by
> commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too
> small for many modern usage. As a result, users have to explicitly set
> it to a larger value to make it work.
>
> After some searching around the web, these are the
> inotify.max_user_watches values used by some projects:
>  - vscode:  524288
>  - dropbox support: 100000
>  - users on stackexchange: 12228
>  - lsyncd user: 2000000
>  - code42 support: 1048576
>  - monodevelop: 16384
>  - tectonic: 524288
>  - openshift origin: 65536
>
> Each watch point adds an inotify_inode_mark structure to an inode to be
> watched. Modeled after the epoll.max_user_watches behavior to adjust the
> default value according to the amount of addressable memory available,
> make inotify.max_user_watches behave in a similar way to make it use
> no more than 1% of addressable memory within the range [8192, 1048576].
>
> For 64-bit archs, inotify_inode_mark should have a size of 80 bytes. That
> means a system with 8GB or more memory will have the maximum value of
> 1048576 for inotify.max_user_watches. This default should be big enough
> for most of the use cases.
>

Alas, the memory usage contributed by inotify watches is dominated by the
directory inodes that they pin to cache.

In effect, this change increases the ability of a given user to use:

1048576(max_user_watches)*~1024(fs inode size) = ~1GB

Surely, inotify watches are not the only way to pin inodes to cache, but
other ways are also resource controlled, for example:
<noproc hardlimit>*<nofile hardlimit>

I did not survey distros for hard limits of noproc and nofile.
On my Ubuntu it's pretty high (63183*1048576). I suppose other distros
may have a lower hard limit by default.

But in any case, open files resource usage has high visibility (via procfs)
and sysadmins and tools are aware of it.

I am afraid this may not be the case with inotify watches. They are also visible
via the inotify fdinfo procfs files, but less people and tools know about them.

In the end, it's a policy decision, but if you want to claim that your change
will not use more than 1% of addressable memory, it might be better to
use 2*sizeof(struct inode) as a closer approximation of the resource usage.

I believe this conservative estimation will result in a default that covers the
needs of most of the common use cases. Also, in general, a system with
a larger filesystem is likely to have more RAM for caching files anyway.

An anecdote: I started developing the fanotify filesystem watch as replacement
to inotify (merged in v5.9) for a system that needs to watch many millions of
directories and pinning all inodes to cache was not an option.

Thanks,
Amir.
Jan Kara Oct. 27, 2020, 4 p.m. UTC | #2
On Mon 26-10-20 16:44:18, Waiman Long wrote:
> The default value of inotify.max_user_watches sysctl parameter was set
> to 8192 since the introduction of the inotify feature in 2005 by
> commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too
> small for many modern usage. As a result, users have to explicitly set
> it to a larger value to make it work.
> 
> After some searching around the web, these are the
> inotify.max_user_watches values used by some projects:
>  - vscode:  524288
>  - dropbox support: 100000
>  - users on stackexchange: 12228
>  - lsyncd user: 2000000
>  - code42 support: 1048576
>  - monodevelop: 16384
>  - tectonic: 524288
>  - openshift origin: 65536
> 
> Each watch point adds an inotify_inode_mark structure to an inode to be
> watched. Modeled after the epoll.max_user_watches behavior to adjust the
> default value according to the amount of addressable memory available,
> make inotify.max_user_watches behave in a similar way to make it use
> no more than 1% of addressable memory within the range [8192, 1048576].
> 
> For 64-bit archs, inotify_inode_mark should have a size of 80 bytes. That
> means a system with 8GB or more memory will have the maximum value of
> 1048576 for inotify.max_user_watches. This default should be big enough
> for most of the use cases.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>

So I agree that 8192 watches seem to be a bit low today but what you
propose seems to be way too much to me. OTOH I agree that having to tune
this manually kind of sucks so I'm for auto-tuning of the default. If the
computation takes into account the fact that a watch pins an inode as Amir
properly notes (that's the main reason why the number of watches is
limited), I think limiting to 1% of pinned memory should be bearable. The
amount of space pinned by an inode is impossible to estimate exactly
(differs for different filesystems) but about 1k for one inode is a sound
estimate IMO.

								Honza

> ---
>  fs/notify/inotify/inotify_user.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
> index 186722ba3894..2da8b7a84b12 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -801,6 +801,18 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
>   */
>  static int __init inotify_user_setup(void)
>  {
> +	unsigned int watches_max;
> +	struct sysinfo si;
> +
> +	si_meminfo(&si);
> +	/*
> +	 * Allow up to 1% of addressible memory to be allocated for inotify
> +	 * watches (per user) limited to the range [8192, 1048576].
> +	 */
> +	watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) /
> +			sizeof(struct inotify_inode_mark);
> +	watches_max = min(1048576U, max(watches_max, 8192U));
> +
>  	BUILD_BUG_ON(IN_ACCESS != FS_ACCESS);
>  	BUILD_BUG_ON(IN_MODIFY != FS_MODIFY);
>  	BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB);
> @@ -827,7 +839,7 @@ static int __init inotify_user_setup(void)
>  
>  	inotify_max_queued_events = 16384;
>  	init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
> -	init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192;
> +	init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
>  
>  	return 0;
>  }
> -- 
> 2.18.1
>
Waiman Long Oct. 29, 2020, 2:25 p.m. UTC | #3
On 10/27/20 12:00 PM, Jan Kara wrote:
> On Mon 26-10-20 16:44:18, Waiman Long wrote:
>> The default value of inotify.max_user_watches sysctl parameter was set
>> to 8192 since the introduction of the inotify feature in 2005 by
>> commit 0eeca28300df ("[PATCH] inotify"). Today this value is just too
>> small for many modern usage. As a result, users have to explicitly set
>> it to a larger value to make it work.
>>
>> After some searching around the web, these are the
>> inotify.max_user_watches values used by some projects:
>>   - vscode:  524288
>>   - dropbox support: 100000
>>   - users on stackexchange: 12228
>>   - lsyncd user: 2000000
>>   - code42 support: 1048576
>>   - monodevelop: 16384
>>   - tectonic: 524288
>>   - openshift origin: 65536
>>
>> Each watch point adds an inotify_inode_mark structure to an inode to be
>> watched. Modeled after the epoll.max_user_watches behavior to adjust the
>> default value according to the amount of addressable memory available,
>> make inotify.max_user_watches behave in a similar way to make it use
>> no more than 1% of addressable memory within the range [8192, 1048576].
>>
>> For 64-bit archs, inotify_inode_mark should have a size of 80 bytes. That
>> means a system with 8GB or more memory will have the maximum value of
>> 1048576 for inotify.max_user_watches. This default should be big enough
>> for most of the use cases.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
> So I agree that 8192 watches seem to be a bit low today but what you
> propose seems to be way too much to me. OTOH I agree that having to tune
> this manually kind of sucks so I'm for auto-tuning of the default. If the
> computation takes into account the fact that a watch pins an inode as Amir
> properly notes (that's the main reason why the number of watches is
> limited), I think limiting to 1% of pinned memory should be bearable. The
> amount of space pinned by an inode is impossible to estimate exactly
> (differs for different filesystems) but about 1k for one inode is a sound
> estimate IMO.
>
> 								Honza

I will certainly do that. Will send out a v2 soon.

Cheers,
Longman
diff mbox series

Patch

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 186722ba3894..2da8b7a84b12 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -801,6 +801,18 @@  SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
  */
 static int __init inotify_user_setup(void)
 {
+	unsigned int watches_max;
+	struct sysinfo si;
+
+	si_meminfo(&si);
+	/*
+	 * Allow up to 1% of addressible memory to be allocated for inotify
+	 * watches (per user) limited to the range [8192, 1048576].
+	 */
+	watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) /
+			sizeof(struct inotify_inode_mark);
+	watches_max = min(1048576U, max(watches_max, 8192U));
+
 	BUILD_BUG_ON(IN_ACCESS != FS_ACCESS);
 	BUILD_BUG_ON(IN_MODIFY != FS_MODIFY);
 	BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB);
@@ -827,7 +839,7 @@  static int __init inotify_user_setup(void)
 
 	inotify_max_queued_events = 16384;
 	init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
-	init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192;
+	init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
 
 	return 0;
 }