diff mbox series

[v4,04/19] selftests/resctrl: Close perf value read fd on errors

Message ID 20230713131932.133258-5-ilpo.jarvinen@linux.intel.com (mailing list archive)
State New
Headers show
Series selftests/resctrl: Fixes and cleanups | expand

Commit Message

Ilpo Järvinen July 13, 2023, 1:19 p.m. UTC
Perf event fd (fd_lm) is not closed on some error paths.

Always close fd_lm in get_llc_perf() and add close into an error
handling block in cat_val().

Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 tools/testing/selftests/resctrl/cache.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Reinette Chatre July 13, 2023, 10:52 p.m. UTC | #1
Hi Ilpo,

On 7/13/2023 6:19 AM, Ilpo Järvinen wrote:
> Perf event fd (fd_lm) is not closed on some error paths.
> 
> Always close fd_lm in get_llc_perf() and add close into an error
> handling block in cat_val().
> 
> Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>  tools/testing/selftests/resctrl/cache.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> index 8a4fe8693be6..ced47b445d1e 100644
> --- a/tools/testing/selftests/resctrl/cache.c
> +++ b/tools/testing/selftests/resctrl/cache.c
> @@ -87,21 +87,20 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no)
>  static int get_llc_perf(unsigned long *llc_perf_miss)
>  {
>  	__u64 total_misses;
> +	int ret;
>  
>  	/* Stop counters after one span to get miss rate */
>  
>  	ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
>  
> -	if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
> +	ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
> +	close(fd_lm);
> +	if (ret == -1) {
>  		perror("Could not get llc misses through perf");
> -
>  		return -1;
>  	}
>  
>  	total_misses = rf_cqm.values[0].value;
> -
> -	close(fd_lm);
> -
>  	*llc_perf_miss = total_misses;
>  
>  	return 0;
> @@ -253,6 +252,7 @@ int cat_val(struct resctrl_val_param *param)
>  					 memflush, operation, resctrl_val)) {
>  				fprintf(stderr, "Error-running fill buffer\n");
>  				ret = -1;
> +				close(fd_lm);
>  				break;
>  			}
>  

Instead of fixing these existing patterns I think it would make the code
easier to understand and maintain if it is made symmetrical.
Having the perf event fd opened in one place but its close()
scattered elsewhere has the potential for confusion and making later
mistakes easy to miss.

What if perf event fd is closed in a new "disable_llc_perf()" that
is matched with "reset_enable_llc_perf()" and called
from cat_val()?

I think this raises another issue with the test trickery where
measure_cache_vals() has some assumptions about state based on the
test name.

Reinette
Ilpo Järvinen July 14, 2023, 10:35 a.m. UTC | #2
On Thu, 13 Jul 2023, Reinette Chatre wrote:

> Hi Ilpo,
> 
> On 7/13/2023 6:19 AM, Ilpo Järvinen wrote:
> > Perf event fd (fd_lm) is not closed on some error paths.
> > 
> > Always close fd_lm in get_llc_perf() and add close into an error
> > handling block in cat_val().
> > 
> > Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > ---
> >  tools/testing/selftests/resctrl/cache.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> > index 8a4fe8693be6..ced47b445d1e 100644
> > --- a/tools/testing/selftests/resctrl/cache.c
> > +++ b/tools/testing/selftests/resctrl/cache.c
> > @@ -87,21 +87,20 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no)
> >  static int get_llc_perf(unsigned long *llc_perf_miss)
> >  {
> >  	__u64 total_misses;
> > +	int ret;
> >  
> >  	/* Stop counters after one span to get miss rate */
> >  
> >  	ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
> >  
> > -	if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
> > +	ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
> > +	close(fd_lm);
> > +	if (ret == -1) {
> >  		perror("Could not get llc misses through perf");
> > -
> >  		return -1;
> >  	}
> >  
> >  	total_misses = rf_cqm.values[0].value;
> > -
> > -	close(fd_lm);
> > -
> >  	*llc_perf_miss = total_misses;
> >  
> >  	return 0;
> > @@ -253,6 +252,7 @@ int cat_val(struct resctrl_val_param *param)
> >  					 memflush, operation, resctrl_val)) {
> >  				fprintf(stderr, "Error-running fill buffer\n");
> >  				ret = -1;
> > +				close(fd_lm);
> >  				break;
> >  			}
> >  
> 
> Instead of fixing these existing patterns I think it would make the code
> easier to understand and maintain if it is made symmetrical.
> Having the perf event fd opened in one place but its close()
> scattered elsewhere has the potential for confusion and making later
> mistakes easy to miss.
> 
> What if perf event fd is closed in a new "disable_llc_perf()" that
> is matched with "reset_enable_llc_perf()" and called
> from cat_val()?
> 
> I think this raises another issue with the test trickery where
> measure_cache_vals() has some assumptions about state based on the
> test name.

I very much agree on the principle here, and thus I already have created 
patches which will do a major cleanup on this area. The cleaned-up code 
has pe_fd local var to cat_val() and handles closing it in cat_val() with 
the usual patterns.

However, the patch is currently resides post L3 CAT test rewrite. 
Backporting the cleanups/refactors into this series would require 
considerable effort due to how convoluted all those n-step cleanup patches 
and L3 CAT test rewrite are in this area. There's just very much to 
cleanup here and L3 rewrite will touch the same areas so its a net 
full of conflicts.

Do you want me to spend the effort to backport them into this series 
(I expect will take some time)?

I currently have these items pending besides this series (in order):
- L3 CAT test rewrite and its preparatory patches
- More cleanups (including the pe_fd cleanup)
- New generalized test framework
- L2 CAT test
Reinette Chatre July 14, 2023, 5:36 p.m. UTC | #3
Hi Ilpo,

On 7/14/2023 3:35 AM, Ilpo Järvinen wrote:
> On Thu, 13 Jul 2023, Reinette Chatre wrote:
>> On 7/13/2023 6:19 AM, Ilpo Järvinen wrote:
>>> Perf event fd (fd_lm) is not closed on some error paths.
>>>
>>> Always close fd_lm in get_llc_perf() and add close into an error
>>> handling block in cat_val().
>>>
>>> Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
>>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>> ---
>>>  tools/testing/selftests/resctrl/cache.c | 10 +++++-----
>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
>>> index 8a4fe8693be6..ced47b445d1e 100644
>>> --- a/tools/testing/selftests/resctrl/cache.c
>>> +++ b/tools/testing/selftests/resctrl/cache.c
>>> @@ -87,21 +87,20 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no)
>>>  static int get_llc_perf(unsigned long *llc_perf_miss)
>>>  {
>>>  	__u64 total_misses;
>>> +	int ret;
>>>  
>>>  	/* Stop counters after one span to get miss rate */
>>>  
>>>  	ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
>>>  
>>> -	if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
>>> +	ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
>>> +	close(fd_lm);
>>> +	if (ret == -1) {
>>>  		perror("Could not get llc misses through perf");
>>> -
>>>  		return -1;
>>>  	}
>>>  
>>>  	total_misses = rf_cqm.values[0].value;
>>> -
>>> -	close(fd_lm);
>>> -
>>>  	*llc_perf_miss = total_misses;
>>>  
>>>  	return 0;
>>> @@ -253,6 +252,7 @@ int cat_val(struct resctrl_val_param *param)
>>>  					 memflush, operation, resctrl_val)) {
>>>  				fprintf(stderr, "Error-running fill buffer\n");
>>>  				ret = -1;
>>> +				close(fd_lm);
>>>  				break;
>>>  			}
>>>  
>>
>> Instead of fixing these existing patterns I think it would make the code
>> easier to understand and maintain if it is made symmetrical.
>> Having the perf event fd opened in one place but its close()
>> scattered elsewhere has the potential for confusion and making later
>> mistakes easy to miss.
>>
>> What if perf event fd is closed in a new "disable_llc_perf()" that
>> is matched with "reset_enable_llc_perf()" and called
>> from cat_val()?
>>
>> I think this raises another issue with the test trickery where
>> measure_cache_vals() has some assumptions about state based on the
>> test name.
> 
> I very much agree on the principle here, and thus I already have created 
> patches which will do a major cleanup on this area. The cleaned-up code 
> has pe_fd local var to cat_val() and handles closing it in cat_val() with 
> the usual patterns.
> 
> However, the patch is currently resides post L3 CAT test rewrite. 
> Backporting the cleanups/refactors into this series would require 
> considerable effort due to how convoluted all those n-step cleanup patches 
> and L3 CAT test rewrite are in this area. There's just very much to 
> cleanup here and L3 rewrite will touch the same areas so its a net 
> full of conflicts.
> 
> Do you want me to spend the effort to backport them into this series 
> (I expect will take some time)?

Considering the "Fixes" tag, having a smaller fix that can easily
be backported would be ideal so I am ok with deferring a bigger
rework.

I do think this fix can be made more robust with a couple of small
changes that should not introduce significant conflicts:
* initialize fd_lm to -1 
* do not close() fd_lm in get_llc_perf() but instead move its
  close() to at exit of cat_val().
* add check in get_llc_perf() that it does not attempt ioctl()
  on "fd_lm == -1" (later addition would be error checking of
  the ioctl())

> I currently have these items pending besides this series (in order):
> - L3 CAT test rewrite and its preparatory patches
> - More cleanups (including the pe_fd cleanup)
> - New generalized test framework
> - L2 CAT test

Thank you very much for taking this on.

Reinette
Ilpo Järvinen July 17, 2023, 1:05 p.m. UTC | #4
On Fri, 14 Jul 2023, Reinette Chatre wrote:
> On 7/14/2023 3:35 AM, Ilpo Järvinen wrote:
> > On Thu, 13 Jul 2023, Reinette Chatre wrote:
> >> On 7/13/2023 6:19 AM, Ilpo Järvinen wrote:
> >>> Perf event fd (fd_lm) is not closed on some error paths.
> >>>
> >>> Always close fd_lm in get_llc_perf() and add close into an error
> >>> handling block in cat_val().
> >>>
> >>> Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
> >>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> >>> ---
> >>>  tools/testing/selftests/resctrl/cache.c | 10 +++++-----
> >>>  1 file changed, 5 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
> >>> index 8a4fe8693be6..ced47b445d1e 100644
> >>> --- a/tools/testing/selftests/resctrl/cache.c
> >>> +++ b/tools/testing/selftests/resctrl/cache.c
> >>> @@ -87,21 +87,20 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no)
> >>>  static int get_llc_perf(unsigned long *llc_perf_miss)
> >>>  {
> >>>  	__u64 total_misses;
> >>> +	int ret;
> >>>  
> >>>  	/* Stop counters after one span to get miss rate */
> >>>  
> >>>  	ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
> >>>  
> >>> -	if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
> >>> +	ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
> >>> +	close(fd_lm);
> >>> +	if (ret == -1) {
> >>>  		perror("Could not get llc misses through perf");
> >>> -
> >>>  		return -1;
> >>>  	}
> >>>  
> >>>  	total_misses = rf_cqm.values[0].value;
> >>> -
> >>> -	close(fd_lm);
> >>> -
> >>>  	*llc_perf_miss = total_misses;
> >>>  
> >>>  	return 0;
> >>> @@ -253,6 +252,7 @@ int cat_val(struct resctrl_val_param *param)
> >>>  					 memflush, operation, resctrl_val)) {
> >>>  				fprintf(stderr, "Error-running fill buffer\n");
> >>>  				ret = -1;
> >>> +				close(fd_lm);
> >>>  				break;
> >>>  			}
> >>>  
> >>
> >> Instead of fixing these existing patterns I think it would make the code
> >> easier to understand and maintain if it is made symmetrical.
> >> Having the perf event fd opened in one place but its close()
> >> scattered elsewhere has the potential for confusion and making later
> >> mistakes easy to miss.
> >>
> >> What if perf event fd is closed in a new "disable_llc_perf()" that
> >> is matched with "reset_enable_llc_perf()" and called
> >> from cat_val()?
> >>
> >> I think this raises another issue with the test trickery where
> >> measure_cache_vals() has some assumptions about state based on the
> >> test name.
> > 
> > I very much agree on the principle here, and thus I already have created 
> > patches which will do a major cleanup on this area. The cleaned-up code 
> > has pe_fd local var to cat_val() and handles closing it in cat_val() with 
> > the usual patterns.
> > 
> > However, the patch is currently resides post L3 CAT test rewrite. 
> > Backporting the cleanups/refactors into this series would require 
> > considerable effort due to how convoluted all those n-step cleanup patches 
> > and L3 CAT test rewrite are in this area. There's just very much to 
> > cleanup here and L3 rewrite will touch the same areas so its a net 
> > full of conflicts.
> > 
> > Do you want me to spend the effort to backport them into this series 
> > (I expect will take some time)?
> 
> Considering the "Fixes" tag, having a smaller fix that can easily
> be backported would be ideal so I am ok with deferring a bigger
> rework.
> 
> I do think this fix can be made more robust with a couple of small
> changes that should not introduce significant conflicts:
> * initialize fd_lm to -1 

> * do not close() fd_lm in get_llc_perf() but instead move its
>   close() to at exit of cat_val().

I changed the test to only close the fd in cat_val() which is the 
direction the later refactor/cleanup changes (not in this series) was 
moving anyway.

> * add check in get_llc_perf() that it does not attempt ioctl()
>   on "fd_lm == -1" (later addition would be error checking of
>   the ioctl())

The other two things suggested seem unnecessary and I've not implemented 
them, I don't thinkg fd_lm can be -1 at ioctl(). Given this code is going 
to be replaced soonish, putting any extra "safety" effort into it now 
seems waste of time.
Reinette Chatre July 17, 2023, 4:09 p.m. UTC | #5
Hi Ilpo,

On 7/17/2023 6:05 AM, Ilpo Järvinen wrote:
> On Fri, 14 Jul 2023, Reinette Chatre wrote:
>> * add check in get_llc_perf() that it does not attempt ioctl()
>>   on "fd_lm == -1" (later addition would be error checking of
>>   the ioctl())
> 
> The other two things suggested seem unnecessary and I've not implemented 
> them, I don't thinkg fd_lm can be -1 at ioctl(). Given this code is going 
> to be replaced soonish, putting any extra "safety" effort into it now 
> seems waste of time.

Yes, this suggestion was indeed to make the code more robust. I
certainly do not want to waste your time. Please keep in mind 
when you respond that I do not have insight into the reworks
you are still planning. 

Reinette
diff mbox series

Patch

diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c
index 8a4fe8693be6..ced47b445d1e 100644
--- a/tools/testing/selftests/resctrl/cache.c
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -87,21 +87,20 @@  static int reset_enable_llc_perf(pid_t pid, int cpu_no)
 static int get_llc_perf(unsigned long *llc_perf_miss)
 {
 	__u64 total_misses;
+	int ret;
 
 	/* Stop counters after one span to get miss rate */
 
 	ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
 
-	if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) {
+	ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
+	close(fd_lm);
+	if (ret == -1) {
 		perror("Could not get llc misses through perf");
-
 		return -1;
 	}
 
 	total_misses = rf_cqm.values[0].value;
-
-	close(fd_lm);
-
 	*llc_perf_miss = total_misses;
 
 	return 0;
@@ -253,6 +252,7 @@  int cat_val(struct resctrl_val_param *param)
 					 memflush, operation, resctrl_val)) {
 				fprintf(stderr, "Error-running fill buffer\n");
 				ret = -1;
+				close(fd_lm);
 				break;
 			}