I am trying to recreate address at a point in time based on current address and a log record of changes. We have a table with customer addresses in fields line1
thorugh line4
. Following is an approximation of the table and data:
create table qtemp.customers (
id int,
line1 char(40),
line2 char(40),
line3 char(40),
line4 char(40));
insert into qtemp.customers (id, line1, line2, line3, line4)
with cust_temp(id, line1, line2, line3, line4) as(
select 1, 'Line1', 'Line2', 'Line3', 'Line4'
from sysibm.sysdummy1
union all
select id+1, line1, line2, line3, line4 from cust_temp where id<15000)
select * from cust_temp;
Then we have a table with logged changes, including logs for changes for individual address lines. The type of changes I am interested in start with 'Line ' and number. They are mixed within other changes. Again a rough approximation of the table:
create table qtemp.changes (
seq int,
dt int,
cid int,
change_type char(40),
change char(40));
insert into qtemp.changes (seq, dt, cid, change_type, change)
with changes_temp(seq, dt, cid, change_type, change) as(
select 1, 1, 1, 'not a real change', 'just a bogus line' from sysibm.sysdummy1
union all
select seq+1,
dt + int(rand() + 0.005), --about 175 changes per day on average
int(rand() * 15000 + 1),
case int(rand() * 13) --little less then 3000 changes to address line
when 1 then 'Line ' || int(rand() * 4 + 1)
else trim(TRANSLATE ( CHAR(BIGINT(RAND() * 50 )), 'abcdefghij', '1234567890' )) || ' Some other change'
end,
TRANSLATE ( CHAR(BIGINT(RAND() * 10000000000 )), 'abcdefghij', '1234567890' )
from changes_temp where seq < 35000)
select * from changes_temp;
My solution to that was to select only 'Line%'
records, transpose them to corresponding line1
thorugh line4
columns and then use window function to fill in the nulls.
with
changes_filtered as (
select * from changes
where change_type like 'Line%'),
--only show the last change for any particular customer id, date and line
changes_latest as (
select a.*
from changes_filtered a
left join changes_filtered b on a.cid = b.cid and a.dt = b.dt and a.change_type = b.change_type and a.seq<b.seq
where b.seq is null),
changes_pivoted as (
select cid, dt,
max(case when change_type = 'Line 1' then change end) line1,
max(case when change_type = 'Line 2' then change end) line2,
max(case when change_type = 'Line 3' then change end) line3,
max(case when change_type = 'Line 4' then change end) line4
from changes_latest
group by cid, dt
union all
select id, 99999, line1, line2, line3, line4 from customers
where id in (select cid from changes_filtered)
),
changes_filled as (
select cid, dt,
first_value(line1) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line1,
first_value(line2) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line2,
first_value(line3) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line3,
first_value(line4) ignore nulls over(partition by cid order by dt rows between current row and unbounded following) line4
from changes_pivoted
)
select * from changes_filled order by cid, dt;
However, when I try to run it, I immediately get following error
[SQL0666] SQL query exceeds specified time limit or storage limit. Cause . . . . . : ? A database query was about to be started whose estimated run time of 2147352576 exceeds the specified limit of 600
Notice the word estimated. This is a preemptive strike. The limit of 600 seconds is set via system value. If I override it with CHGQRYA
, the query runs in 150ms. So the estimated runtime is totally bogus. When I look at visual explain, cumulative time for each OLAP grows exponentially. First estimated time is 1134s, second 4M s, third 1400M s, fourth 50000000M s.
I found this IBM document about ODBC Query Timeout Property: SQL0666 Estimated Query Processing Time Exceeds Limit stating that
If the proper indexes are not in place, the estimate may be very poor
(and the query may not perform well).
but 150ms vs 1.6M years is not even wrong. The query performs well, but the estimate is not even in this galaxy.
EDIT: I guess the question I am trying to ask is whether there is a solution to this problem without changing the system value for runtime (QQRYTIMLMT) and without building dedicated indexes just for this query.
EDIT2: Creating indexes has no effect. I tried to create indexes on my own as well as those suggested by index advisor and I experienced no difference in the estimated run time. I have a case opened with IBM.
CREATE INDEX QTEMP/CHANGES_IDX
ON QTEMP/CHANGES (CID ASC, DT ASC) UNIT ANY KEEP IN MEMORY NO;
CREATE INDEX QTEMP/CHANGES_IDX2
ON QTEMP/CHANGES (CHANGE_TYPE ASC, DT ASC, CID ASC) UNIT ANY KEEP IN MEMORY NO;
CREATE INDEX QTEMP/CUSTOMERS_IDX
ON QTEMP/CUSTOMERS (ID ASC) UNIT ANY KEEP IN MEMORY NO;
CREATE INDEX QTEMP/CHANGES_IDX3
ON QTEMP/CHANGES (CID ASC, CHANGE_TYPE ASC) UNIT ANY KEEP IN MEMORY NO;
question from:
https://stackoverflow.com/questions/66049236/db2-problem-with-a-bogus-estimated-run-time-and-sql0666