Message ID | 20200721184205.15232-9-ian.jackson@eu.citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Flight report performance improvements | expand |
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > After we refactor this query then we can enable the index use. > (Both of these things together in this commit because I haven't perf > tested the version with just the refactoring.) > > (We have provided an index that can answer this question really > quickly if a version is specified. But the query planner couldn't see > that because it works without seeing the bind variables, so doesn't > know that the value of name is going to be suitable for this index.) > > * Convert the two EXISTS subqueries into JOIN/AND with a DISTINCT > clause naming the fields on flights, so as to replicate the previous > result rows. Then do $selection field last. The subquery is a > convenient way to let this do the previous thing for all the values > of $selection (including, notably, *). > > * Add the additional AND clause for r.name, which has no logical > effect given the actual values of name, enabling the query planner > to use this index. > > Perf: In my test case the sg-report-flight runtime is now ~8s. I am > reasonably confident that this will not make other use cases of this > code worse. > > Perf: runtime of my test case now ~11s > > Example query before (from the Perl DBI trace): > > SELECT * > FROM flights f > WHERE > EXISTS ( > SELECT 1 > FROM runvars r > WHERE name=? > AND val=? > AND r.flight=f.flight > AND ( (CASE > WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' > WHEN ((r.job) LIKE 'build-%-freebsd' > AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' > ELSE '' > END) > = '') > ) > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND (branch=?) > ORDER BY flight DESC > LIMIT 1 So this says: Get me all the columns for the highest-numbered flight Where: There is at least one runvar for that flight has the specified $name and $value And the job is *not* like build-%-prev or build-%-freebsd The flight number (?) is <= 151903, and blessing = real For the specified $branch What’s the “TRUE and flight <= 151903” for? > > After: > > SELECT * > FROM ( SELECT DISTINCT > flight, started, blessing, branch, intended > FROM flights f > JOIN runvars r USING (flight) > WHERE name=? > AND name LIKE 'revision_%' > AND val=? > AND r.flight=f.flight > AND ( (CASE > WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' > WHEN ((r.job) LIKE 'build-%-freebsd' > AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' > ELSE '' > END) > = '') > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND (branch=?) > ) AS sub WHERE TRUE > ORDER BY flight DESC > LIMIT 1 And this says (effectively) Get me <flight, started, blessing, branch, intended> From the highest-numbered flight Where That flight has a runvar with specified name and value The job *doesn’t* look like “build-%-prev” or “build-%-freebsd” flight & blessing as appropriate branch as specified. Isn’t the r.flight = f.flight redundant if we’re joining on flight? Also, in spite of the paragraph attempting to explain it, I’m afraid I don’t understand what the “AS sub WHERE TRUE” is for. But it looks like the new query should do the same thing as the old query, assuming that the columns from the subquery are all the columns that you need in the correct order. -George
George Dunlap writes ("Re: [OSSTEST PATCH 08/14] Executive: Use index for report__find_test"): > > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Example query before (from the Perl DBI trace): ... > So this says: > > Get me all the columns > for the highest-numbered flight > Where: > There is at least one runvar for that flight has the specified $name and $value > And the job is *not* like build-%-prev or build-%-freebsd > The flight number (?) is <= 151903, and blessing = real > For the specified $branch Yes. > What’s the “TRUE and flight <= 151903” for? These queries are programmetically constructed. In this case, the flight condition is not always there. My test case had a --max-flight=151903 on the command line: this is a debugging option. It avoids newly added stuff in the db confusing me and generally disturbing things. This is implemented with a condition variable which contains either "" or "and flight <= 151903". Doing it this way simplifies the generation code. > And this says (effectively) > > Get me <flight, started, blessing, branch, intended> > From the highest-numbered flight > Where > That flight has a runvar with specified name and value > The job *doesn’t* look like “build-%-prev” or “build-%-freebsd” > flight & blessing as appropriate > branch as specified. I think so, yes. > Isn’t the r.flight = f.flight redundant if we’re joining on flight? Indeed it is. I guess I can add a patch at theend to delete that. > Also, in spite of the paragraph attempting to explain it, I’m afraid > I don’t understand what the “AS sub WHERE TRUE” is for. The reason for the subquery is not evident in the SQL. It's because of the Perl code which generates this query. The same code is used to generate queries that start with things like SELECT * ... SELECT COUNT(*) AS count ... The perl code gets told "*" or "COUNT(*) AS count". The call sites that pass "*" expect to see fields from flights. It would be possible to change "*" to the explicit field list everywhere, but it was much easier to do it this way. (The WHERE TRUE is another one of these stubs where a condition might appear.) > But it looks like the new query should do the same thing as the old > query, assuming that the columns from the subquery are all the > columns that you need in the correct order. The subquery columns are precisely the columns currently existing in he flights table. Thanks, Ian.
diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index c3dc1261..c272e9f2 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -415,37 +415,32 @@ sub report__find_test ($$$$$$$) { my $querytext = <<END; SELECT $selection - FROM flights f - WHERE + FROM ( SELECT DISTINCT + flight, started, blessing, branch, intended + FROM flights f END if (defined $revision) { if ($tree eq 'osstest') { $querytext .= <<END; - EXISTS ( - SELECT 1 - FROM flights_harness_touched t + JOIN flights_harness_touched t USING (flight) WHERE t.harness=? - AND t.flight=f.flight - ) END push @params, $revision; } else { $querytext .= <<END; - EXISTS ( - SELECT 1 - FROM runvars r + JOIN runvars r USING (flight) WHERE name=? + AND name LIKE 'revision_%' AND val=? AND r.flight=f.flight AND ${\ main_revision_job_cond('r.job') } - ) END push @params, "revision_$tree", $revision; } } else { $querytext .= <<END; - TRUE + WHERE TRUE END } @@ -460,6 +455,7 @@ END END push @params, @$branches; + $querytext .= ") AS sub WHERE TRUE\n"; $querytext .= $extracond; $querytext .= $sortlimit; diff --git a/schema/runvars-revision-index.sql b/schema/runvars-revision-index.sql index a2e3be13..4c1aea66 100644 --- a/schema/runvars-revision-index.sql +++ b/schema/runvars-revision-index.sql @@ -1,4 +1,4 @@ --- ##OSSTEST## 008 Preparatory +-- ##OSSTEST## 008 Needed -- -- This index helps Executive::report__find_test find relevant flights.
After we refactor this query then we can enable the index use. (Both of these things together in this commit because I haven't perf tested the version with just the refactoring.) (We have provided an index that can answer this question really quickly if a version is specified. But the query planner couldn't see that because it works without seeing the bind variables, so doesn't know that the value of name is going to be suitable for this index.) * Convert the two EXISTS subqueries into JOIN/AND with a DISTINCT clause naming the fields on flights, so as to replicate the previous result rows. Then do $selection field last. The subquery is a convenient way to let this do the previous thing for all the values of $selection (including, notably, *). * Add the additional AND clause for r.name, which has no logical effect given the actual values of name, enabling the query planner to use this index. Perf: In my test case the sg-report-flight runtime is now ~8s. I am reasonably confident that this will not make other use cases of this code worse. Perf: runtime of my test case now ~11s Example query before (from the Perl DBI trace): SELECT * FROM flights f WHERE EXISTS ( SELECT 1 FROM runvars r WHERE name=? AND val=? AND r.flight=f.flight AND ( (CASE WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' WHEN ((r.job) LIKE 'build-%-freebsd' AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' ELSE '' END) = '') ) AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND (branch=?) ORDER BY flight DESC LIMIT 1 After: SELECT * FROM ( SELECT DISTINCT flight, started, blessing, branch, intended FROM flights f JOIN runvars r USING (flight) WHERE name=? AND name LIKE 'revision_%' AND val=? AND r.flight=f.flight AND ( (CASE WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' WHEN ((r.job) LIKE 'build-%-freebsd' AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' ELSE '' END) = '') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND (branch=?) ) AS sub WHERE TRUE ORDER BY flight DESC LIMIT 1 In both cases with bind vars: 'revision_xen' '165f3afbfc3db70fcfdccad07085cde0a03c858b' "xen-unstable" Diff to the example query: @@ -1,10 +1,10 @@ SELECT * + FROM ( SELECT DISTINCT + flight, started, blessing, branch, intended FROM flights f - WHERE - EXISTS ( - SELECT 1 - FROM runvars r + JOIN runvars r USING (flight) WHERE name=? + AND name LIKE 'revision_%' AND val=? AND r.flight=f.flight AND ( (CASE @@ -14,8 +14,8 @@ ELSE '' END) = '') - ) AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND (branch=?) +) AS sub WHERE TRUE ORDER BY flight DESC LIMIT 1 CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 20 ++++++++------------ schema/runvars-revision-index.sql | 2 +- 2 files changed, 9 insertions(+), 13 deletions(-)