Message ID | 20240606172813.2755930-1-isaacmanjarres@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v5] fs: Improve eventpoll logging to stop indicting timerfd | expand |
On Thu, Jun 06, 2024 at 10:28:11AM -0700, Isaac J. Manjarres wrote: > From: Manish Varma <varmam@google.com> > > timerfd doesn't create any wakelocks, but eventpoll can. When it does, > it names them after the underlying file descriptor, and since all > timerfd file descriptors are named "[timerfd]" (which saves memory on > systems like desktops with potentially many timerfd instances), all > wakesources created as a result of using the eventpoll-on-timerfd idiom > are called... "[timerfd]". > > However, it becomes impossible to tell which "[timerfd]" wakesource is > affliated with which process and hence troubleshooting is difficult. > > This change addresses this problem by changing the way eventpoll > wakesources are named: > > 1) the top-level per-process eventpoll wakesource is now named > "epollN:P" (instead of just "eventpoll"), where N is a unique ID token, > and P is the PID of the creating process. > 2) individual per-underlying-file descriptor eventpoll wakesources are > now named "epollitemN:P.F", where N is a unique ID token and P is PID > of the creating process and F is the name of the underlying file > descriptor. > > Co-developed-by: Kelly Rossmoyer <krossmo@google.com> > Signed-off-by: Kelly Rossmoyer <krossmo@google.com> > Signed-off-by: Manish Varma <varmam@google.com> > Co-developed-by: Isaac J. Manjarres <isaacmanjarres@google.com> > Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> > --- > drivers/base/power/wakeup.c | 12 +++++++++--- > fs/eventpoll.c | 11 +++++++++-- > include/linux/pm_wakeup.h | 8 ++++---- > 3 files changed, 22 insertions(+), 9 deletions(-) > > v1 -> v2: > - Renamed instance count to wakesource_create_id to better describe > its purpose. > - Changed the wakeup source naming convention for wakeup sources > created by eventpoll to avoid changing the timerfd names. > - Used the PID of the process instead of the process name for the > sake of uniqueness when creating wakeup sources. > > v2 -> v3: > - Changed wakeup_source_register() to take in a format string > and arguments to avoid duplicating code to construct wakeup > source names. > - Moved the definition of wakesource_create_id so that it is > always defined to fix an compilation error. > > v3 -> v4: > - Changed the naming convention for the top-level epoll wakeup > sources to include an ID for uniqueness. This is needed in > cases where a process is using two epoll fds. > - Edited commit log to reflect new changes and add new tags. > > v4 -> v5: > - Added the format attribute to the wakeup_source_register() > function to address a warning from the kernel test robot: > https://lore.kernel.org/all/202406050504.UvdlPAQ0-lkp@intel.com/ > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > index 752b417e8129..04a808607b62 100644 > --- a/drivers/base/power/wakeup.c > +++ b/drivers/base/power/wakeup.c > @@ -209,13 +209,19 @@ EXPORT_SYMBOL_GPL(wakeup_source_remove); > /** > * wakeup_source_register - Create wakeup source and add it to the list. > * @dev: Device this wakeup source is associated with (or NULL if virtual). > - * @name: Name of the wakeup source to register. > + * @fmt: format string for the wakeup source name > */ > -struct wakeup_source *wakeup_source_register(struct device *dev, > - const char *name) > +__printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, > + const char *fmt, ...) > { > struct wakeup_source *ws; > int ret; > + char name[128]; > + va_list args; > + > + va_start(args, fmt); > + vsnprintf(name, sizeof(name), fmt, args); > + va_end(args); > > ws = wakeup_source_create(name); > if (ws) { > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index f53ca4f7fced..941df15208a4 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -338,6 +338,7 @@ static void __init epoll_sysctls_init(void) > #define epoll_sysctls_init() do { } while (0) > #endif /* CONFIG_SYSCTL */ > > +static atomic_t wakesource_create_id = ATOMIC_INIT(0); > static const struct file_operations eventpoll_fops; > > static inline int is_file_epoll(struct file *f) > @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi) > { > struct name_snapshot n; > struct wakeup_source *ws; > + pid_t task_pid; > + int id; > + > + task_pid = task_pid_nr(current); > > if (!epi->ep->ws) { > - epi->ep->ws = wakeup_source_register(NULL, "eventpoll"); > + id = atomic_inc_return(&wakesource_create_id); > + epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid); > if (!epi->ep->ws) > return -ENOMEM; > } > > + id = atomic_inc_return(&wakesource_create_id); > take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry); > - ws = wakeup_source_register(NULL, n.name.name); > + ws = wakeup_source_register(NULL, "epollitem%d:%d.%s", id, task_pid, n.name.name); > release_dentry_name_snapshot(&n); > > if (!ws) > diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h > index 76cd1f9f1365..1fb6dca981c2 100644 > --- a/include/linux/pm_wakeup.h > +++ b/include/linux/pm_wakeup.h > @@ -99,8 +99,8 @@ extern struct wakeup_source *wakeup_source_create(const char *name); > extern void wakeup_source_destroy(struct wakeup_source *ws); > extern void wakeup_source_add(struct wakeup_source *ws); > extern void wakeup_source_remove(struct wakeup_source *ws); > -extern struct wakeup_source *wakeup_source_register(struct device *dev, > - const char *name); > +extern __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, > + const char *fmt, ...); > extern void wakeup_source_unregister(struct wakeup_source *ws); > extern int wakeup_sources_read_lock(void); > extern void wakeup_sources_read_unlock(int idx); > @@ -140,8 +140,8 @@ static inline void wakeup_source_add(struct wakeup_source *ws) {} > > static inline void wakeup_source_remove(struct wakeup_source *ws) {} > > -static inline struct wakeup_source *wakeup_source_register(struct device *dev, > - const char *name) > +static inline __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, > + const char *fmt, ...) > { > return NULL; > } > -- > 2.45.2.505.gda0bf45e8d-goog Hello, Just following up to see if there are comments or concerns with this patch? Thanks, Isaac
On Thu, Jun 6, 2024 at 10:28 AM 'Isaac J. Manjarres' via kernel-team <kernel-team@android.com> wrote: > > From: Manish Varma <varmam@google.com> > > timerfd doesn't create any wakelocks, but eventpoll can. When it does, > it names them after the underlying file descriptor, and since all > timerfd file descriptors are named "[timerfd]" (which saves memory on > systems like desktops with potentially many timerfd instances), all > wakesources created as a result of using the eventpoll-on-timerfd idiom > are called... "[timerfd]". > > However, it becomes impossible to tell which "[timerfd]" wakesource is > affliated with which process and hence troubleshooting is difficult. Thanks for sending this out! My apologies, as this is really meta-commentary (which I'm sure isn't what you're looking for), but as you've gotten limited feedback maybe it might help? While your explanation above is understandable, I feel like it might benefit from a more concrete example to show why this is problematic? It feels like the description gets into the weeds pretty quickly and makes it hard to understand the importance of the change. > This change addresses this problem by changing the way eventpoll > wakesources are named: > > 1) the top-level per-process eventpoll wakesource is now named > "epollN:P" (instead of just "eventpoll"), where N is a unique ID token, > and P is the PID of the creating process. > 2) individual per-underlying-file descriptor eventpoll wakesources are > now named "epollitemN:P.F", where N is a unique ID token and P is PID > of the creating process and F is the name of the underlying file > descriptor. Again the N:P.F mapping is clear, but maybe including a specific before and after example would help? Additionally, once you have this better named wakesource, can you provide a specific example to illustrate a bit on how this specifically helps the troubleshooting that was difficult before? thanks -john
On Mon, Jun 24, 2024 at 11:03:43AM -0700, John Stultz wrote: > On Thu, Jun 6, 2024 at 10:28 AM 'Isaac J. Manjarres' via kernel-team > <kernel-team@android.com> wrote: > > > > From: Manish Varma <varmam@google.com> > > > > timerfd doesn't create any wakelocks, but eventpoll can. When it does, > > it names them after the underlying file descriptor, and since all > > timerfd file descriptors are named "[timerfd]" (which saves memory on > > systems like desktops with potentially many timerfd instances), all > > wakesources created as a result of using the eventpoll-on-timerfd idiom > > are called... "[timerfd]". > > > > However, it becomes impossible to tell which "[timerfd]" wakesource is > > affliated with which process and hence troubleshooting is difficult. > > While your explanation above is understandable, I feel like it might > benefit from a more concrete example to show why this is problematic? > It feels like the description gets into the weeds pretty quickly and > makes it hard to understand the importance of the change. > > Again the N:P.F mapping is clear, but maybe including a specific > before and after example would help? > > Additionally, once you have this better named wakesource, can you > provide a specific example to illustrate a bit on how this > specifically helps the troubleshooting that was difficult before? > > thanks > -john Hi John, Thanks for your feedback on this! I'm more than happy to add more details to the commit text. I'll go ahead and add an example to showcase a scenario where the proposed changes make debugging easier. I'll send out v6 of the patch soon. --Isaac
On Thu, Jun 06, 2024 at 10:28:11AM -0700, Isaac J. Manjarres wrote: > From: Manish Varma <varmam@google.com> > > timerfd doesn't create any wakelocks, but eventpoll can. When it does, > it names them after the underlying file descriptor, and since all > timerfd file descriptors are named "[timerfd]" (which saves memory on > systems like desktops with potentially many timerfd instances), all > wakesources created as a result of using the eventpoll-on-timerfd idiom > are called... "[timerfd]". > > However, it becomes impossible to tell which "[timerfd]" wakesource is > affliated with which process and hence troubleshooting is difficult. > > This change addresses this problem by changing the way eventpoll > wakesources are named: > > 1) the top-level per-process eventpoll wakesource is now named > "epollN:P" (instead of just "eventpoll"), where N is a unique ID token, > and P is the PID of the creating process. > 2) individual per-underlying-file descriptor eventpoll wakesources are > now named "epollitemN:P.F", where N is a unique ID token and P is PID > of the creating process and F is the name of the underlying file > descriptor. > > Co-developed-by: Kelly Rossmoyer <krossmo@google.com> > Signed-off-by: Kelly Rossmoyer <krossmo@google.com> > Signed-off-by: Manish Varma <varmam@google.com> > Co-developed-by: Isaac J. Manjarres <isaacmanjarres@google.com> > Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com> > --- > drivers/base/power/wakeup.c | 12 +++++++++--- > fs/eventpoll.c | 11 +++++++++-- > include/linux/pm_wakeup.h | 8 ++++---- > 3 files changed, 22 insertions(+), 9 deletions(-) > > v1 -> v2: > - Renamed instance count to wakesource_create_id to better describe > its purpose. > - Changed the wakeup source naming convention for wakeup sources > created by eventpoll to avoid changing the timerfd names. > - Used the PID of the process instead of the process name for the > sake of uniqueness when creating wakeup sources. > > v2 -> v3: > - Changed wakeup_source_register() to take in a format string > and arguments to avoid duplicating code to construct wakeup > source names. > - Moved the definition of wakesource_create_id so that it is > always defined to fix an compilation error. > > v3 -> v4: > - Changed the naming convention for the top-level epoll wakeup > sources to include an ID for uniqueness. This is needed in > cases where a process is using two epoll fds. > - Edited commit log to reflect new changes and add new tags. > > v4 -> v5: > - Added the format attribute to the wakeup_source_register() > function to address a warning from the kernel test robot: > https://lore.kernel.org/all/202406050504.UvdlPAQ0-lkp@intel.com/ > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c > index 752b417e8129..04a808607b62 100644 > --- a/drivers/base/power/wakeup.c > +++ b/drivers/base/power/wakeup.c > @@ -209,13 +209,19 @@ EXPORT_SYMBOL_GPL(wakeup_source_remove); > /** > * wakeup_source_register - Create wakeup source and add it to the list. > * @dev: Device this wakeup source is associated with (or NULL if virtual). > - * @name: Name of the wakeup source to register. > + * @fmt: format string for the wakeup source name > */ > -struct wakeup_source *wakeup_source_register(struct device *dev, > - const char *name) > +__printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, > + const char *fmt, ...) > { > struct wakeup_source *ws; > int ret; > + char name[128]; > + va_list args; > + > + va_start(args, fmt); > + vsnprintf(name, sizeof(name), fmt, args); > + va_end(args); > > ws = wakeup_source_create(name); > if (ws) { > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index f53ca4f7fced..941df15208a4 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -338,6 +338,7 @@ static void __init epoll_sysctls_init(void) > #define epoll_sysctls_init() do { } while (0) > #endif /* CONFIG_SYSCTL */ > > +static atomic_t wakesource_create_id = ATOMIC_INIT(0); > static const struct file_operations eventpoll_fops; > > static inline int is_file_epoll(struct file *f) > @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi) > { > struct name_snapshot n; > struct wakeup_source *ws; > + pid_t task_pid; > + int id; > + > + task_pid = task_pid_nr(current); > > if (!epi->ep->ws) { > - epi->ep->ws = wakeup_source_register(NULL, "eventpoll"); > + id = atomic_inc_return(&wakesource_create_id); > + epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid); How often does this execute? Is it at most once per task lifespan? The var probably wants to be annotated with ____cacheline_aligned_in_smp so that it does not accidentally mess with other stuff. I am assuming there is no constant traffic on it. > if (!epi->ep->ws) > return -ENOMEM; > } > > + id = atomic_inc_return(&wakesource_create_id); > take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry); > - ws = wakeup_source_register(NULL, n.name.name); > + ws = wakeup_source_register(NULL, "epollitem%d:%d.%s", id, task_pid, n.name.name); > release_dentry_name_snapshot(&n); > > if (!ws) > diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h > index 76cd1f9f1365..1fb6dca981c2 100644 > --- a/include/linux/pm_wakeup.h > +++ b/include/linux/pm_wakeup.h > @@ -99,8 +99,8 @@ extern struct wakeup_source *wakeup_source_create(const char *name); > extern void wakeup_source_destroy(struct wakeup_source *ws); > extern void wakeup_source_add(struct wakeup_source *ws); > extern void wakeup_source_remove(struct wakeup_source *ws); > -extern struct wakeup_source *wakeup_source_register(struct device *dev, > - const char *name); > +extern __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, > + const char *fmt, ...); > extern void wakeup_source_unregister(struct wakeup_source *ws); > extern int wakeup_sources_read_lock(void); > extern void wakeup_sources_read_unlock(int idx); > @@ -140,8 +140,8 @@ static inline void wakeup_source_add(struct wakeup_source *ws) {} > > static inline void wakeup_source_remove(struct wakeup_source *ws) {} > > -static inline struct wakeup_source *wakeup_source_register(struct device *dev, > - const char *name) > +static inline __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, > + const char *fmt, ...) > { > return NULL; > } > -- > 2.45.2.505.gda0bf45e8d-goog >
On Tue, Jun 25, 2024 at 07:58:43PM +0200, Mateusz Guzik wrote: > On Thu, Jun 06, 2024 at 10:28:11AM -0700, Isaac J. Manjarres wrote: > > +static atomic_t wakesource_create_id = ATOMIC_INIT(0); > > static const struct file_operations eventpoll_fops; > > > > static inline int is_file_epoll(struct file *f) > > @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi) > > { > > struct name_snapshot n; > > struct wakeup_source *ws; > > + pid_t task_pid; > > + int id; > > + > > + task_pid = task_pid_nr(current); > > > > if (!epi->ep->ws) { > > - epi->ep->ws = wakeup_source_register(NULL, "eventpoll"); > > + id = atomic_inc_return(&wakesource_create_id); > > + epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid); > > How often does this execute? Is it at most once per task lifespan? Thank you for your feedback! This can execute multiple times throughout a task's lifespan. However, I haven't seen it execute that often. > The var probably wants to be annotated with ____cacheline_aligned_in_smp > so that it does not accidentally mess with other stuff. > > I am assuming there is no constant traffic on it. Right, I don't see much traffic on it. Can you please elaborate a bit more on what interaction you're concerned with here? If it's a concern about false sharing, I'm worried that we may be prematurely optimizing this. --Isaac
On Wed, Jul 3, 2024 at 11:37 PM Isaac Manjarres <isaacmanjarres@google.com> wrote: > > On Tue, Jun 25, 2024 at 07:58:43PM +0200, Mateusz Guzik wrote: > > On Thu, Jun 06, 2024 at 10:28:11AM -0700, Isaac J. Manjarres wrote: > > > +static atomic_t wakesource_create_id = ATOMIC_INIT(0); > > > static const struct file_operations eventpoll_fops; > > > > > > static inline int is_file_epoll(struct file *f) > > > @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi) > > > { > > > struct name_snapshot n; > > > struct wakeup_source *ws; > > > + pid_t task_pid; > > > + int id; > > > + > > > + task_pid = task_pid_nr(current); > > > > > > if (!epi->ep->ws) { > > > - epi->ep->ws = wakeup_source_register(NULL, "eventpoll"); > > > + id = atomic_inc_return(&wakesource_create_id); > > > + epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid); > > > > How often does this execute? Is it at most once per task lifespan? > Thank you for your feedback! This can execute multiple times throughout > a task's lifespan. However, I haven't seen it execute that often. > > > The var probably wants to be annotated with ____cacheline_aligned_in_smp > > so that it does not accidentally mess with other stuff. > > > > I am assuming there is no constant traffic on it. > Right, I don't see much traffic on it. Can you please elaborate a bit > more on what interaction you're concerned with here? If it's a > concern about false sharing, I'm worried that we may be prematurely > optimizing this. > I am concerned with false sharing indeed, specifically with this landing with something unrelated to epoll. Preferably the linker would not merge cachelines across different .o files and that would make the problem mostly sorted out. In the meantime I would argue basic multicore hygiene dictates vars like this one get moved out of the way if only to not accidentally mess with other stuff. But I am not going to pester you about it, It's not my call for this code either.
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c index 752b417e8129..04a808607b62 100644 --- a/drivers/base/power/wakeup.c +++ b/drivers/base/power/wakeup.c @@ -209,13 +209,19 @@ EXPORT_SYMBOL_GPL(wakeup_source_remove); /** * wakeup_source_register - Create wakeup source and add it to the list. * @dev: Device this wakeup source is associated with (or NULL if virtual). - * @name: Name of the wakeup source to register. + * @fmt: format string for the wakeup source name */ -struct wakeup_source *wakeup_source_register(struct device *dev, - const char *name) +__printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, + const char *fmt, ...) { struct wakeup_source *ws; int ret; + char name[128]; + va_list args; + + va_start(args, fmt); + vsnprintf(name, sizeof(name), fmt, args); + va_end(args); ws = wakeup_source_create(name); if (ws) { diff --git a/fs/eventpoll.c b/fs/eventpoll.c index f53ca4f7fced..941df15208a4 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -338,6 +338,7 @@ static void __init epoll_sysctls_init(void) #define epoll_sysctls_init() do { } while (0) #endif /* CONFIG_SYSCTL */ +static atomic_t wakesource_create_id = ATOMIC_INIT(0); static const struct file_operations eventpoll_fops; static inline int is_file_epoll(struct file *f) @@ -1545,15 +1546,21 @@ static int ep_create_wakeup_source(struct epitem *epi) { struct name_snapshot n; struct wakeup_source *ws; + pid_t task_pid; + int id; + + task_pid = task_pid_nr(current); if (!epi->ep->ws) { - epi->ep->ws = wakeup_source_register(NULL, "eventpoll"); + id = atomic_inc_return(&wakesource_create_id); + epi->ep->ws = wakeup_source_register(NULL, "epoll:%d:%d", id, task_pid); if (!epi->ep->ws) return -ENOMEM; } + id = atomic_inc_return(&wakesource_create_id); take_dentry_name_snapshot(&n, epi->ffd.file->f_path.dentry); - ws = wakeup_source_register(NULL, n.name.name); + ws = wakeup_source_register(NULL, "epollitem%d:%d.%s", id, task_pid, n.name.name); release_dentry_name_snapshot(&n); if (!ws) diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h index 76cd1f9f1365..1fb6dca981c2 100644 --- a/include/linux/pm_wakeup.h +++ b/include/linux/pm_wakeup.h @@ -99,8 +99,8 @@ extern struct wakeup_source *wakeup_source_create(const char *name); extern void wakeup_source_destroy(struct wakeup_source *ws); extern void wakeup_source_add(struct wakeup_source *ws); extern void wakeup_source_remove(struct wakeup_source *ws); -extern struct wakeup_source *wakeup_source_register(struct device *dev, - const char *name); +extern __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, + const char *fmt, ...); extern void wakeup_source_unregister(struct wakeup_source *ws); extern int wakeup_sources_read_lock(void); extern void wakeup_sources_read_unlock(int idx); @@ -140,8 +140,8 @@ static inline void wakeup_source_add(struct wakeup_source *ws) {} static inline void wakeup_source_remove(struct wakeup_source *ws) {} -static inline struct wakeup_source *wakeup_source_register(struct device *dev, - const char *name) +static inline __printf(2, 3) struct wakeup_source *wakeup_source_register(struct device *dev, + const char *fmt, ...) { return NULL; }