Message ID | 1561811293-75769-1-git-send-email-zhengbin13@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: change last_ino type to unsigned long | expand |
On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote: > tmpfs use get_next_ino to get inode number, if last_ino wraps, > there will be files share the same inode number. Change last_ino > type to unsigned long. Is this a serious problem? I'd be more convinced by a patch to use the sbitmap data structure than a blind conversion to use atomic64_t.
On 2019/6/29 22:21, Matthew Wilcox wrote: > On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote: >> tmpfs use get_next_ino to get inode number, if last_ino wraps, >> there will be files share the same inode number. Change last_ino >> type to unsigned long. > Is this a serious problem? I'd be more convinced by a patch to use > the sbitmap data structure than a blind conversion to use atomic64_t. Yes, if two files share the same inode number, when application uses dlopen to get file handle, there will be problems. Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode, we need to add it to the hashtable), the questions are: 1. inode creation will be slow down, as the comment of function iunique says: * BUGS: * With a large number of inodes live on the file system this function * currently becomes quite slow. 2. we need to convert all callers of get_next_ino to use iunique (tmpfs, autofs, configfs...), moreover, the more callers are converted, the function of iunique will be more slower. Looking forward to your reply, thanks. > . >
On Mon, Jul 01, 2019 at 09:48:03AM +0800, zhengbin (A) wrote: > > On 2019/6/29 22:21, Matthew Wilcox wrote: > > On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote: > >> tmpfs use get_next_ino to get inode number, if last_ino wraps, > >> there will be files share the same inode number. Change last_ino > >> type to unsigned long. > > Is this a serious problem? > > Yes, if two files share the same inode number, when application uses dlopen to get > > file handle, there will be problems. That wasn't what I meant. Does it happen in practice? Have you observed it, or are you just worrying about a potential problem? > Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode, > > we need to add it to the hashtable), the questions are: > > 1. inode creation will be slow down, as the comment of function iunique says: > > * BUGS: > * With a large number of inodes live on the file system this function > * currently becomes quite slow. > > 2. we need to convert all callers of get_next_ino to use iunique (tmpfs, autofs, configfs...), > > moreover, the more callers are converted, the function of iunique will be more slower. > > > Looking forward to your reply, thanks. > > I'd be more convinced by a patch to use > > the sbitmap data structure than a blind conversion to use atomic64_t. You ignored this sentence.
On 2019/7/1 9:52, Matthew Wilcox wrote: > On Mon, Jul 01, 2019 at 09:48:03AM +0800, zhengbin (A) wrote: >> On 2019/6/29 22:21, Matthew Wilcox wrote: >>> On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote: >>>> tmpfs use get_next_ino to get inode number, if last_ino wraps, >>>> there will be files share the same inode number. Change last_ino >>>> type to unsigned long. >>> Is this a serious problem? >> Yes, if two files share the same inode number, when application uses dlopen to get >> >> file handle, there will be problems. > That wasn't what I meant. Does it happen in practice? Have you observed > it, or are you just worrying about a potential problem? Yes, it happens in practice. we use a script to test it, after 10 days, this problem will happen: while the script are as follows: while(1) { create a file dlopen it ... remove it } >> Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode, >> >> we need to add it to the hashtable), the questions are: >> >> 1. inode creation will be slow down, as the comment of function iunique says: >> >> * BUGS: >> * With a large number of inodes live on the file system this function >> * currently becomes quite slow. >> >> 2. we need to convert all callers of get_next_ino to use iunique (tmpfs, autofs, configfs...), >> >> moreover, the more callers are converted, the function of iunique will be more slower. >> >> >> Looking forward to your reply, thanks. >>> I'd be more convinced by a patch to use >>> the sbitmap data structure than a blind conversion to use atomic64_t. > You ignored this sentence. I will study sbitmap data structure > . >
On 2019/7/1 9:52, Matthew Wilcox wrote: > On Mon, Jul 01, 2019 at 09:48:03AM +0800, zhengbin (A) wrote: >> On 2019/6/29 22:21, Matthew Wilcox wrote: >>> On Sat, Jun 29, 2019 at 08:28:13PM +0800, zhengbin wrote: >>>> tmpfs use get_next_ino to get inode number, if last_ino wraps, >>>> there will be files share the same inode number. Change last_ino >>>> type to unsigned long. >>> Is this a serious problem? >> Yes, if two files share the same inode number, when application uses dlopen to get >> >> file handle, there will be problems. > That wasn't what I meant. Does it happen in practice? Have you observed > it, or are you just worrying about a potential problem? > >> Maybe we could use iunique to try to get a unique i_ino value(when we allocate new inode, >> >> we need to add it to the hashtable), the questions are: >> >> 1. inode creation will be slow down, as the comment of function iunique says: >> >> * BUGS: >> * With a large number of inodes live on the file system this function >> * currently becomes quite slow. >> >> 2. we need to convert all callers of get_next_ino to use iunique (tmpfs, autofs, configfs...), >> >> moreover, the more callers are converted, the function of iunique will be more slower. >> >> >> Looking forward to your reply, thanks. >>> I'd be more convinced by a patch to use >>> the sbitmap data structure than a blind conversion to use atomic64_t. > You ignored this sentence. Hi, maybe sbitmap is not a good solution, max_inodes of tmpfs are controlled by mount options--nrinodes, which can be modified by remountfs(bigger or smaller), as the comment of function sbitmap_resize says: * Doesn't reallocate anything. It's up to the caller to ensure that the new * depth doesn't exceed the depth that the sb was initialized with. We can modify this to meet the growing requirements, there will still be questions as follows: 1. tmpfs is a ram filesystem, we need to allocate sbitmap memory for sbinfo->max_inodes(while this maybe huge) 2.If remountfs changes max_inode, we have to deal with it, while this may take a long time (bigger: we need to free the old sbitmap memory, allocate new memory, copy the old sbitmap to new sbitmap smaller: How do we deal with it?ie: we use sb->map[inode number/8] to find the sbitmap, we need to change the exist inode numbers?while this maybe used by userspace application.) > > . >
diff --git a/fs/inode.c b/fs/inode.c index df6542e..84cf8e3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -870,22 +870,20 @@ static struct inode *find_inode_fast(struct super_block *sb, * 2^32 range, and is a worst-case. Even a 50% wastage would only increase * overflow rate by 2x, which does not seem too significant. * - * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW - * error if st_ino won't fit in target struct field. Use 32bit counter - * here to attempt to avoid that. */ #define LAST_INO_BATCH 1024 -static DEFINE_PER_CPU(unsigned int, last_ino); +static DEFINE_PER_CPU(unsigned long, last_ino); -unsigned int get_next_ino(void) +unsigned long get_next_ino(void) { - unsigned int *p = &get_cpu_var(last_ino); - unsigned int res = *p; + unsigned long *p = &get_cpu_var(last_ino); + unsigned long res = *p; #ifdef CONFIG_SMP if (unlikely((res & (LAST_INO_BATCH-1)) == 0)) { - static atomic_t shared_last_ino; - int next = atomic_add_return(LAST_INO_BATCH, &shared_last_ino); + static atomic64_t shared_last_ino; + long next = atomic64_add_return(LAST_INO_BATCH, + &shared_last_ino); res = next - LAST_INO_BATCH; } diff --git a/include/linux/fs.h b/include/linux/fs.h index f7fdfe9..51f153d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3005,7 +3005,7 @@ static inline void lockdep_annotate_inode_mutex_key(struct inode *inode) { }; #endif extern void unlock_new_inode(struct inode *); extern void discard_new_inode(struct inode *); -extern unsigned int get_next_ino(void); +extern unsigned long get_next_ino(void); extern void evict_inodes(struct super_block *sb); extern void __iget(struct inode * inode);
tmpfs use get_next_ino to get inode number, if last_ino wraps, there will be files share the same inode number. Change last_ino type to unsigned long. PS: Type of __old_kernel_stat->st_ino(cp_old_stat) is unsigned short in x86 & arm, if we want to avoid an EOVERFLOW error on non LFS stat() call, we need to change last_ino type to unsigned short Signed-off-by: zhengbin <zhengbin13@huawei.com> --- fs/inode.c | 16 +++++++--------- include/linux/fs.h | 2 +- 2 files changed, 8 insertions(+), 10 deletions(-) -- 2.7.4