diff mbox series

[v4,5/6] connector/cn_proc: Performance improvements

Message ID 20230331235528.1106675-6-anjali.k.kulkarni@oracle.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Process connector bug fixes & enhancements | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Anjali Kulkarni March 31, 2023, 11:55 p.m. UTC
This patch adds the capability to filter messages sent by the proc
connector on the event type supplied in the message from the client
to the connector. The client can register to listen for an event type
given in struct proc_input.

This event based filteting will greatly enhance performance - handling
8K exits takes about 70ms, whereas 8K-forks + 8K-exits takes about 150ms
& handling 8K-forks + 8K-exits + 8K-execs takes 200ms. There are currently
9 different types of events, and we need to listen to all of them. Also,
measuring the time using pidfds for monitoring 8K process exits took
much longer - 200ms, as compared to 70ms using only exit notifications of
proc connector.

We also add a new event type - PROC_EVENT_NONZERO_EXIT, which is
only sent by kernel to a listening application when any process exiting,
has a non-zero exit status. This will help the clients like Oracle DB,
where a monitoring process wants notfications for non-zero process exits
so it can cleanup after them.

This kind of a new event could also be useful to other applications like
Google's lmkd daemon, which needs a killed process's exit notification.

The patch takes care that existing clients using old mechanism of not
sending the event type work without any changes.

cn_filter function checks to see if the event type being notified via
proc connector matches the event type requested by client, before
sending(matches) or dropping(does not match) a packet.

The proc_filter.c test file is updated to reflect the new filtering.

Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com>
---
 drivers/connector/cn_proc.c     | 59 +++++++++++++++++++++++++++++----
 include/uapi/linux/cn_proc.h    | 19 +++++++++++
 samples/connector/proc_filter.c | 47 +++++++++++++++++++++++---
 3 files changed, 115 insertions(+), 10 deletions(-)

Comments

Jakub Kicinski June 1, 2023, 4:25 p.m. UTC | #1
On Fri, 31 Mar 2023 16:55:27 -0700 Anjali Kulkarni wrote:
> +#define FILTER
> +
> +#ifdef FILTER
> +#define NL_MESSAGE_SIZE (sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + \
> +			 sizeof(struct proc_input))
> +#else
>  #define NL_MESSAGE_SIZE (sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + \
>  			 sizeof(int))
> +#endif

The #define FILTER and ifdefs around it need to go, this much I can
tell you without understanding what it does :S We have the git history
we don't need to keep dead code around.
Anjali Kulkarni June 1, 2023, 4:38 p.m. UTC | #2
> On Jun 1, 2023, at 9:25 AM, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> On Fri, 31 Mar 2023 16:55:27 -0700 Anjali Kulkarni wrote:
>> +#define FILTER
>> +
>> +#ifdef FILTER
>> +#define NL_MESSAGE_SIZE (sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + \
>> +			 sizeof(struct proc_input))
>> +#else
>> #define NL_MESSAGE_SIZE (sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + \
>> 			 sizeof(int))
>> +#endif
> 
> The #define FILTER and ifdefs around it need to go, this much I can
> tell you without understanding what it does :S We have the git history
> we don't need to keep dead code around.

The FILTER option is for backwards compatibility for those who may be using the proc connector today - so they do not need to immediately switch to using the new method - the example just shows the old method which does not break or need changes - do you still want me to remove the FILTER? 

Anjali
Jakub Kicinski June 1, 2023, 4:48 p.m. UTC | #3
On Thu, 1 Jun 2023 16:38:04 +0000 Anjali Kulkarni wrote:
> > The #define FILTER and ifdefs around it need to go, this much I can
> > tell you without understanding what it does :S We have the git history
> > we don't need to keep dead code around.  
> 
> The FILTER option is for backwards compatibility for those who may be
> using the proc connector today - so they do not need to immediately
> switch to using the new method - the example just shows the old
> method which does not break or need changes - do you still want me to
> remove the FILTER? 

Is it possible to recode the sample so the format can be decided based
on cmd line argument? To be honest samples are kinda dead, it'd be best
if the code was rewritten to act as a selftest.
Anjali Kulkarni June 1, 2023, 4:53 p.m. UTC | #4
> On Jun 1, 2023, at 9:48 AM, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> On Thu, 1 Jun 2023 16:38:04 +0000 Anjali Kulkarni wrote:
>>> The #define FILTER and ifdefs around it need to go, this much I can
>>> tell you without understanding what it does :S We have the git history
>>> we don't need to keep dead code around.  
>> 
>> The FILTER option is for backwards compatibility for those who may be
>> using the proc connector today - so they do not need to immediately
>> switch to using the new method - the example just shows the old
>> method which does not break or need changes - do you still want me to
>> remove the FILTER? 
> 
> Is it possible to recode the sample so the format can be decided based
> on cmd line argument? To be honest samples are kinda dead, it'd be best
> if the code was rewritten to act as a selftest.

Yes, I can recode to use a cmd line argument. Where would a selftest be committed? This is kind of a self test in the sense that this is working code  to test the other kernel code. What else is needed to make it a selftest?

Anjali
Jakub Kicinski June 1, 2023, 5:15 p.m. UTC | #5
On Thu, 1 Jun 2023 16:53:07 +0000 Anjali Kulkarni wrote:
> > Is it possible to recode the sample so the format can be decided based
> > on cmd line argument? To be honest samples are kinda dead, it'd be best
> > if the code was rewritten to act as a selftest.  
> 
> Yes, I can recode to use a cmd line argument. Where would a selftest
> be committed?

The path flow is the same as for the sample, the file just goes to
tools/testing/selftests rather than samples/.

> This is kind of a self test in the sense that this is
> working code  to test the other kernel code. What else is needed to
> make it a selftest?

Not much, really. I think the requirement is to exit with a non-zero
return code on failure, which you already do. 0 means success; 1 means
failure; 2 means skip, IIRC.

The main work in your case would be that the selftest needs to do its
checking and exit, so the stimuli must be triggered automatically.
(You can use a bash script to drive the events.)
Anjali Kulkarni June 2, 2023, 10:23 p.m. UTC | #6
> On Jun 1, 2023, at 10:15 AM, Jakub Kicinski <kuba@kernel.org> wrote:
> 
> On Thu, 1 Jun 2023 16:53:07 +0000 Anjali Kulkarni wrote:
>>> Is it possible to recode the sample so the format can be decided based
>>> on cmd line argument? To be honest samples are kinda dead, it'd be best
>>> if the code was rewritten to act as a selftest.  
>> 
>> Yes, I can recode to use a cmd line argument. Where would a selftest
>> be committed?
> 
> The path flow is the same as for the sample, the file just goes to
> tools/testing/selftests rather than samples/.
> 
>> This is kind of a self test in the sense that this is
>> working code  to test the other kernel code. What else is needed to
>> make it a selftest?
> 
> Not much, really. I think the requirement is to exit with a non-zero
> return code on failure, which you already do. 0 means success; 1 means
> failure; 2 means skip, IIRC.
> 
> The main work in your case would be that the selftest needs to do its
> checking and exit, so the stimuli must be triggered automatically.
> (You can use a bash script to drive the events.)

Thanks! So this will be part of the kselftest infra right? 
https://docs.kernel.org/dev-tools/kselftest.html ?
Jakub Kicinski June 2, 2023, 11:02 p.m. UTC | #7
On Fri, 2 Jun 2023 22:23:01 +0000 Anjali Kulkarni wrote:
> > Not much, really. I think the requirement is to exit with a non-zero
> > return code on failure, which you already do. 0 means success; 1 means
> > failure; 2 means skip, IIRC.
> > 
> > The main work in your case would be that the selftest needs to do its
> > checking and exit, so the stimuli must be triggered automatically.
> > (You can use a bash script to drive the events.)  
> 
> Thanks! So this will be part of the kselftest infra right? 
> https://docs.kernel.org/dev-tools/kselftest.html ?

Yes, that's right.
diff mbox series

Patch

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index 84f38d2bd4b9..35bec1fd7ee0 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -50,21 +50,44 @@  static DEFINE_PER_CPU(struct local_event, local_event) = {
 
 static int cn_filter(struct sock *dsk, struct sk_buff *skb, void *data)
 {
+	uintptr_t val;
+	__u32 what, exit_code, *ptr;
 	enum proc_cn_mcast_op mc_op;
 
-	if (!dsk)
+	if (!dsk || !data)
 		return 0;
 
+	ptr = (__u32 *)data;
+	what = *ptr++;
+	exit_code = *ptr;
+	val = ((struct proc_input *)(dsk->sk_user_data))->event_type;
 	mc_op = ((struct proc_input *)(dsk->sk_user_data))->mcast_op;
 
 	if (mc_op == PROC_CN_MCAST_IGNORE)
 		return 1;
 
-	return 0;
+	if ((__u32)val == PROC_EVENT_ALL)
+		return 0;
+	/*
+	 * Drop packet if we have to report only non-zero exit status
+	 * (PROC_EVENT_NONZERO_EXIT) and exit status is 0
+	 */
+	if (((__u32)val & PROC_EVENT_NONZERO_EXIT) &&
+	    (what == PROC_EVENT_EXIT)) {
+		if (exit_code)
+			return 0;
+		else
+			return 1;
+	}
+	if ((__u32)val & what)
+		return 0;
+	return 1;
 }
 
 static inline void send_msg(struct cn_msg *msg)
 {
+	__u32 filter_data[2];
+
 	local_lock(&local_event.lock);
 
 	msg->seq = __this_cpu_inc_return(local_event.count) - 1;
@@ -76,8 +99,15 @@  static inline void send_msg(struct cn_msg *msg)
 	 *
 	 * If cn_netlink_send() fails, the data is not sent.
 	 */
+	filter_data[0] = ((struct proc_event *)msg->data)->what;
+	if (filter_data[0] == PROC_EVENT_EXIT) {
+		filter_data[1] =
+		((struct proc_event *)msg->data)->event_data.exit.exit_code;
+	} else {
+		filter_data[1] = 0;
+	}
 	cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
-			     cn_filter, NULL);
+			     cn_filter, (void *)filter_data);
 
 	local_unlock(&local_event.lock);
 }
@@ -357,12 +387,15 @@  static void cn_proc_ack(int err, int rcvd_seq, int rcvd_ack)
 
 /**
  * cn_proc_mcast_ctl
- * @data: message sent from userspace via the connector
+ * @msg: message sent from userspace via the connector
+ * @nsp: NETLINK_CB of the client's socket buffer
  */
 static void cn_proc_mcast_ctl(struct cn_msg *msg,
 			      struct netlink_skb_parms *nsp)
 {
 	enum proc_cn_mcast_op mc_op = 0, prev_mc_op = 0;
+	struct proc_input *pinput = NULL;
+	enum proc_cn_event ev_type = 0;
 	int err = 0, initial = 0;
 	struct sock *sk = NULL;
 
@@ -381,11 +414,21 @@  static void cn_proc_mcast_ctl(struct cn_msg *msg,
 		goto out;
 	}
 
-	if (msg->len == sizeof(mc_op))
+	if (msg->len == sizeof(*pinput)) {
+		pinput = (struct proc_input *)msg->data;
+		mc_op = pinput->mcast_op;
+		ev_type = pinput->event_type;
+	} else if (msg->len == sizeof(mc_op)) {
 		mc_op = *((enum proc_cn_mcast_op *)msg->data);
-	else
+		ev_type = PROC_EVENT_ALL;
+	} else
 		return;
 
+	ev_type = valid_event((enum proc_cn_event)ev_type);
+
+	if (ev_type == PROC_EVENT_NONE)
+		ev_type = PROC_EVENT_ALL;
+
 	if (nsp->sk) {
 		sk = nsp->sk;
 		if (sk->sk_user_data == NULL) {
@@ -396,6 +439,8 @@  static void cn_proc_mcast_ctl(struct cn_msg *msg,
 			prev_mc_op =
 			((struct proc_input *)(sk->sk_user_data))->mcast_op;
 		}
+		((struct proc_input *)(sk->sk_user_data))->event_type =
+			ev_type;
 		((struct proc_input *)(sk->sk_user_data))->mcast_op = mc_op;
 	}
 
@@ -407,6 +452,8 @@  static void cn_proc_mcast_ctl(struct cn_msg *msg,
 	case PROC_CN_MCAST_IGNORE:
 		if (!initial && (prev_mc_op != PROC_CN_MCAST_IGNORE))
 			atomic_dec(&proc_event_num_listeners);
+		((struct proc_input *)(sk->sk_user_data))->event_type =
+			PROC_EVENT_NONE;
 		break;
 	default:
 		err = EINVAL;
diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h
index 6a06fb424313..f2afb7cc4926 100644
--- a/include/uapi/linux/cn_proc.h
+++ b/include/uapi/linux/cn_proc.h
@@ -30,6 +30,15 @@  enum proc_cn_mcast_op {
 	PROC_CN_MCAST_IGNORE = 2
 };
 
+#define PROC_EVENT_ALL (PROC_EVENT_FORK | PROC_EVENT_EXEC | PROC_EVENT_UID |  \
+			PROC_EVENT_GID | PROC_EVENT_SID | PROC_EVENT_PTRACE | \
+			PROC_EVENT_COMM | PROC_EVENT_NONZERO_EXIT |           \
+			PROC_EVENT_COREDUMP | PROC_EVENT_EXIT)
+
+/*
+ * If you add an entry in proc_cn_event, make sure you add it in
+ * PROC_EVENT_ALL above as well.
+ */
 enum proc_cn_event {
 	/* Use successive bits so the enums can be used to record
 	 * sets of events as well
@@ -45,15 +54,25 @@  enum proc_cn_event {
 	/* "next" should be 0x00000400 */
 	/* "last" is the last process event: exit,
 	 * while "next to last" is coredumping event
+	 * before that is report only if process dies
+	 * with non-zero exit status
 	 */
+	PROC_EVENT_NONZERO_EXIT = 0x20000000,
 	PROC_EVENT_COREDUMP = 0x40000000,
 	PROC_EVENT_EXIT = 0x80000000
 };
 
 struct proc_input {
 	enum proc_cn_mcast_op mcast_op;
+	enum proc_cn_event event_type;
 };
 
+static inline enum proc_cn_event valid_event(enum proc_cn_event ev_type)
+{
+	ev_type &= PROC_EVENT_ALL;
+	return ev_type;
+}
+
 /*
  * From the user's point of view, the process
  * ID is the thread group ID and thread ID is the internal
diff --git a/samples/connector/proc_filter.c b/samples/connector/proc_filter.c
index 25202f5bc126..63504fc5f002 100644
--- a/samples/connector/proc_filter.c
+++ b/samples/connector/proc_filter.c
@@ -15,22 +15,33 @@ 
 #include <errno.h>
 #include <signal.h>
 
+#define FILTER
+
+#ifdef FILTER
+#define NL_MESSAGE_SIZE (sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + \
+			 sizeof(struct proc_input))
+#else
 #define NL_MESSAGE_SIZE (sizeof(struct nlmsghdr) + sizeof(struct cn_msg) + \
 			 sizeof(int))
+#endif
 
 #define MAX_EVENTS 1
 
+volatile static int interrupted;
+static int nl_sock, ret_errno, tcount;
+static struct epoll_event evn;
+
 #ifdef ENABLE_PRINTS
 #define Printf printf
 #else
 #define Printf
 #endif
 
-volatile static int interrupted;
-static int nl_sock, ret_errno, tcount;
-static struct epoll_event evn;
-
+#ifdef FILTER
+int send_message(struct proc_input *pinp)
+#else
 int send_message(enum proc_cn_mcast_op mcast_op)
+#endif
 {
 	char buff[NL_MESSAGE_SIZE];
 	struct nlmsghdr *hdr;
@@ -50,8 +61,14 @@  int send_message(enum proc_cn_mcast_op mcast_op)
 	msg->ack = 0;
 	msg->flags = 0;
 
+#ifdef FILTER
+	msg->len = sizeof(struct proc_input);
+	((struct proc_input *)msg->data)->mcast_op = pinp->mcast_op;
+	((struct proc_input *)msg->data)->event_type = pinp->event_type;
+#else
 	msg->len = sizeof(int);
 	*(int *)msg->data = mcast_op;
+#endif
 
 	if (send(nl_sock, hdr, hdr->nlmsg_len, 0) == -1) {
 		ret_errno = errno;
@@ -61,7 +78,11 @@  int send_message(enum proc_cn_mcast_op mcast_op)
 	return 0;
 }
 
+#ifdef FILTER
+int register_proc_netlink(int *efd, struct proc_input *input)
+#else
 int register_proc_netlink(int *efd, enum proc_cn_mcast_op mcast_op)
+#endif
 {
 	struct sockaddr_nl sa_nl;
 	int err = 0, epoll_fd;
@@ -92,7 +113,11 @@  int register_proc_netlink(int *efd, enum proc_cn_mcast_op mcast_op)
 		return -2;
 	}
 
+#ifdef FILTER
+	err = send_message(input);
+#else
 	err = send_message(mcast_op);
+#endif
 	if (err < 0)
 		return err;
 
@@ -223,10 +248,19 @@  int main(int argc, char *argv[])
 {
 	int epoll_fd, err;
 	struct proc_event proc_ev;
+#ifdef FILTER
+	struct proc_input input;
+#endif
 
 	signal(SIGINT, sigint);
 
+#ifdef FILTER
+	input.event_type = PROC_EVENT_NONZERO_EXIT;
+	input.mcast_op = PROC_CN_MCAST_LISTEN;
+	err = register_proc_netlink(&epoll_fd, &input);
+#else
 	err = register_proc_netlink(&epoll_fd, PROC_CN_MCAST_LISTEN);
+#endif
 	if (err < 0) {
 		if (err == -2)
 			close(nl_sock);
@@ -252,7 +286,12 @@  int main(int argc, char *argv[])
 		}
 	}
 
+#ifdef FILTER
+	input.mcast_op = PROC_CN_MCAST_IGNORE;
+	send_message(&input);
+#else
 	send_message(PROC_CN_MCAST_IGNORE);
+#endif
 
 	close(epoll_fd);
 	close(nl_sock);