I am just getting started with Google Data Flow, I have written a simple flow that reads a CSV file from cloud storage. One of the steps involves calling a web service to enrich results. The web service in question performs much better when sending several 100 requests in bulk.
In looking at API I don't see a great way to aggregate 100 elements of a PCollection into a single Par.do Execution. The results would need to be then split to handle the last step of the flow which is writing to a BigQuery table.
Not sure if I need to use windowing is what I want. Most of the windowing examples I see are more geared towards counting over a given time period.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…