Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
413 views
in Technique[技术] by (71.8m points)

size - Teradata Question - Multiple Stores and Multiple Items

Good Afternoon,

I am using Teradata to pull sales data from multiple stores and multiple items to assess performance by program and version. The resulting extract is way too big to use and I need to cut it down. I am better at Tableau than Teradata and I am not sure the best way to reduce the extract size using the SQL code.

  • Using SQL to pull by Store, by Week, by Item(SKU), with ~10,000 stores and 20-50 items per program
  • Using an excel sheet as a reference point for tableau to know what item goes in what program, and then which store has which version

For example, 10 items make up a program, and then of say 3,000 stores 500 are one version, 1,000 are the second version, and the remaining 1,500 stores are the third version. Is there a way to do this and cut down on the extract size?

The code I am using, for example, is:

Select Cast "Store"
,"SKU"
,"WEEK"
,"YEAR"
,SUM("Sales")

WHERE ("SKU" IN (1,2,3,4,5) AND
"YEAR" => 2020)

FROM Database

GROUP BY "Store","SKU","WEEK","YEAR";

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Optimize ETL Flow:

Is there any specific reason to not connect Tableau directly to Teradata? I am assuming that Teradata contains the actual data and you are extracting that data into a file, perhaps in CSV format, importing that file into Excel and then importing Excel into Tableau. Is that correct? If so, then you really should look into this link to set up a connection from Tableau directly to Teradata.

This should eliminate CSV files and Excel's involvement, which should be the main problems with large extracts' analysis.

Reduce Row Count:

If the issue really remains, you should look into narrowing your results. You mentioned your use case as assessment of sales performance. Are you interested in best, worst, average performers? You can trim your data based upon that. Replace DESC with ASC if you are looking for worst performers in following query. TOP N limits the results to N number of rows, if you need anything other than 10, replace it with your required count.

SELECT
TOP         10
            "Store"
            ,"SKU"
            ,"WEEK"
            ,"YEAR"
            ,SUM("Sales") as SUM_SALES
FROM        Database.tablename
WHERE       "SKU" IN (1,2,3,4,5)
            AND
            "YEAR" >= 2020    
GROUP BY    "Store"
            ,"SKU"
            ,"WEEK"
            ,"YEAR"
ORDER BY    SUM_SALES DESC
;

Use Sampling:

If you are more interested in overall trends rather than specific values (e.g. drawing charts etc.), and would like to reduce the result set randomly while preserving the high level trends, use Teradata's SAMPLE clause.

SELECT      "Store"
            ,"SKU"
            ,"WEEK"
            ,"YEAR"
            ,SUM("Sales") as SUM_SALES
FROM        Database.tablename
WHERE       "SKU" IN (1,2,3,4,5)
            AND
            "YEAR" >= 2020    
GROUP BY    "Store"
            ,"SKU"
            ,"WEEK"
            ,"YEAR"
ORDER BY    SUM_SALES DESC
SAMPLE      0.25
;

Notice that TOP N is no longer present. SAMPLE can accept a specific row count if you provide a whole number > 1 or a percentage sample from total set if you provide a fraction between 0 and 1. In the example above, it will return 25% sample, so your extracted data will be 25% (1/4th) of original size. This link goes in depth about the SAMPLE clause in Teradata.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...