Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
596 views
in Technique[技术] by (71.8m points)

gcp ai platform notebook - Best way to create randomly assigned partitions in Google BigQuery

I have a BigQuery table that is not randomly sorted. The IDs are also not random. I would like to partition the data into chunks based on a random number, so that I can use those chunks for various parts of the project.

The solution I have in mind is to add two columns to my table: a randomly generated number, and a partition number. I am following this code snippet on AI Platform Notebooks.

The only substantive difference is I've changed the query_job line to

traintestsplit="""
DECLARE randn NUMERIC; 
DECLARE split INT64 default 0; 
LOOP
  SET randn = RAND();  
  IF (randn < (1/3)) THEN
    SET split = 1;
  END IF; 
  IF (randn > (2/3)) THEN 
    SET split = 3;
  ELSE
    SET split = 2; 
  END IF;
END LOOP; 
"""

query_job = client.query(traintestsplit,
    job_config=job_config,
)  # Make an API request.
query_job.result()  # Wait for the job to complete.

I get the error that someone else got, BadRequest: 400 configuration.query.destinationTable cannot be set for scripts

(job ID: 676675d7-9151-4626-8a7e-96263232f7b2) and have read through Cannot set destination table with BigQuery Python API but I need something that stays constant if I am going to use these partitions.

Should I approach this problem in another way? A very naive way would be to pull the IDs from the BigQuery table, generate a random number, save the random number as a CSV, and then do a join every time I pull the data but that seems terribly inefficient.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...