I am using the streak BigQuery developer tool and noticed some wierd behaviour in the "Query Cost". when digging into the details, i found out a weird behavior in the totalBytesBilled and totalBytesProcessed properties.
but i had some trouble understanding it...
From the BigQuery resource:
- statistics.query.totalBytesBilled :Total bytes billed for the job.
- statistics.query.totalBytesProcessed : Total bytes processed for the
job.
The description of these 2 properties is pretty vague...
Based on my past experience I expect these 2 to be the same after I have consumed the free portion of my quota.
A sample query on the sample data set
SELECT word, word_count
FROM [publicdata:samples.shakespeare] S
LIMIT 1000
returned:
"totalBytesProcessed": "2650191",
"totalBytesBilled": "10485760",
- Can someone please give better explanation what are these properties and what is the difference between them?
- How come for some (pretty small) queries I get totalBytesBilled significantly higher than totalBytesProcessed?
- How are they calculated?
- Any tips for optimizing my queries to minimize "totalBytesBilled"
- in https://cloud.google.com/bigquery/pricing#on_demand it says: "High Compute Tiers apply for queries that consume
extraordinarily large computing resources relative to the amount of
bytes scanned. For example, queries that contain a very large
number of JOIN or CROSS JOIN clauses, or complex user-defined
functions (UDFs) with large processing requirements."
Can you please be more specific? How many is "very large number of join clauses"? What makes a UDF "complex"?
Thanks
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…