Message ID | 20211014213646.1139469-1-krisman@collabora.com (mailing list archive) |
---|---|
Headers | show |
Series | file system-wide error monitoring | expand |
On Fri, Oct 15, 2021 at 12:37 AM Gabriel Krisman Bertazi <krisman@collabora.com> wrote: > > Hi, > > This attempts to get the ball rolling again for the FAN_FS_ERROR. This > version is slightly different from the previous approaches, since it uses > mempool for memory allocation, as suggested by Jan. It has the > advantage of simplifying a lot the enqueue/dequeue, which is now much > more similar to other event types, but it also means the guarantee that > an error event will be available is diminished. Makes me very happy not having to worry about new enqueue/dequeue bugs :) > > The way we propagate superblock errors also changed. Now we use > FILEID_ROOT internally, and mangle it prior to copy_to_user. > > I am no longer sure how to guarantee that at least one mempoll slot will > be available for each filesystem. Since we are now tying the poll to > the entire group, a stream of errors in a single file system might > prevent others from emitting an error. The possibility of this is > reduced since we merge errors to the same filesystem, but it is still > possible that they occur during the small window where the event is > dequeued and before it is freed, in which case another filesystem might > not be able to obtain a slot. Double buffering. Each mark/fs should have one slot reserved for equeue and one reserved for copying the event to user. > > I'm also creating a poll of 32 entries initially to avoid spending too > much memory. This means that only 32 filesystems can be watched per > group with the FAN_FS_ERROR mark, before fanotify_mark starts returning > ENOMEM. I don't see a problem to grow the pool dynamically up to a reasonable size, although it is a shame that the pool is not accounted to the group's memcg (I think?). Overall, the series looks very good to me, modulo to above comments about the mempool size/resize and a few minor implementation details. Good job! Thanks, Amir.
Hi! On Thu 14-10-21 18:36:18, Gabriel Krisman Bertazi wrote: > This attempts to get the ball rolling again for the FAN_FS_ERROR. This > version is slightly different from the previous approaches, since it uses > mempool for memory allocation, as suggested by Jan. It has the > advantage of simplifying a lot the enqueue/dequeue, which is now much > more similar to other event types, but it also means the guarantee that > an error event will be available is diminished. > > The way we propagate superblock errors also changed. Now we use > FILEID_ROOT internally, and mangle it prior to copy_to_user. > > I am no longer sure how to guarantee that at least one mempoll slot will > be available for each filesystem. Since we are now tying the poll to > the entire group, a stream of errors in a single file system might > prevent others from emitting an error. The possibility of this is > reduced since we merge errors to the same filesystem, but it is still > possible that they occur during the small window where the event is > dequeued and before it is freed, in which case another filesystem might > not be able to obtain a slot. Yes, but this happening would mean we hit this race on one fs, error on another fs, and ENOMEM with GFP_NOFS allocation to top it. Not very likely IMO. Also in that case we will queue overflow event in fanotify_handle_event() so it will not be silent loss. The listening application will learn that it missed some events. > I'm also creating a poll of 32 entries initially to avoid spending too > much memory. This means that only 32 filesystems can be watched per > group with the FAN_FS_ERROR mark, before fanotify_mark starts returning > ENOMEM. We can consider auto-grow as Amir suggests but I also think you somewhat misunderstand how mempools work. If you call mempool_alloc(), it will first try to allocate memory with kmalloc() (using GFP_NOFS mask which you pass to it). In 99.9% of cases this just succeeds. If kmalloc() fails, only then mempool_alloc() will take one of the preallocated events and return it. So even with mempool of size 32, we will not usually run out of events when we have more than 32 filesystems. But it is true we cannot guarantee reporting error to more than 32 filesystems under ENOMEM conditions. I'm not sure if that matters... Honza