Message ID | 20241220030830.272429-1-neilb@suse.de |
---|---|
Headers | show |
Series | Allow concurrent changes in a directory | expand |
On Dec 19, 2024, at 7:54 PM, NeilBrown <neilb@suse.de> wrote: > > A while ago I posted a patchset with a similar goal as this: > > https://lore.kernel.org/all/166147828344.25420.13834885828450967910.stgit@noble.brown/ > > and recieved useful feedback. Here is a new version. > > This version is not complete. It does not change rename and does not > change any filesystem to make use of the new opportunity for > parallelism. I'll work on those once the bases functionality is agreed > on. > > With this series, instead of a filesystem setting a flag to indiciate > that parallel updates are support, there are now a new set of inode > operations with a _shared prefix. If a directory provides a _shared > interface it will be used with a shared lock on the inode, else the > current interface will be used with an exclusive lock. Hi Neil, thanks for the patch. One minor nit for the next revision of the cover letter: > Another motivation is lustre which > can use a modified ext4 as the storage backend. One of the current > modification is to allow concurrent updates in a directory as lustre uses a flat directory structure to store data. This isn't really correct. Lustre uses a directory tree for the namespace, but directories might become very large in some cases with 1M+ cores working in a single directory (hey, I don't write the applications, I just need to deal with them). The servers will only have 500-2000 threads working on a single directory, but the fine-grained locking on the servers is definitely a big win. Being able to have parallel locking on the client VFS side would also be a win, given that large nodes commonly have 192 or 256 cores/threads today. We know parallel directory locking will be a win because mounting the filesystem multiple times on a single client (which the VFS treats as multiple separate filesystems) and running a multi-threaded benchmark in each mount in parallel is considerably faster than running the same number of threads in a single mountpoint. Cheers, Andreas