An answer to your primary question, 'What is the proper way to benchmark part of tensorflow graph?':
Tensorflow includes an abstract class that provides helpers for tensorflow benchmarks: Benchmark.
So, a Benchmark
object can be made and used to execute a benchmark on part of a tensorflow graph. In the code below, a benchmark object is instantiated and then, the run_op_benchmark
method is called. run_op_benchmark
is passed the session, the conv_block
Tensor (in this case), a feed_dict
, a number of burn iterations, the desired minimum number of iterations, a boolean flag to keep the benchmark from also computing memory usage and a convenient name. The method returns a dictionary containing the benchmark results:
benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf,
feed_dict={x_tf: x_np}, burn_iters=2,
min_iters=n_iter,
store_memory_usage=False, name='example')
This block of code can be inserted within your code as follows to compare the two benchmarkings:
import os
import time
import numpy as np
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
np.random.seed(2020)
def conv_block(x, kernel_size=3):
# Define some part of graph here
bs, h, w, c = x.shape
in_channels = c
out_channels = c
with tf.compat.v1.variable_scope('var_scope'):
w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.keras.initializers.glorot_normal())
x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')
return x
def get_data_batch(spatial_size, n_channels):
bs = 1
h = spatial_size
w = spatial_size
c = n_channels
x_np = np.random.rand(bs, h, w, c)
x_np = x_np.astype(np.float32)
#print('x_np.shape', x_np.shape)
return x_np
def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
print('=' * 60)
print(f_name.__name__)
tf.reset_default_graph()
with tf.Session() as sess:
x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
z_tf = f_name(x_tf)
sess.run(tf.global_variables_initializer())
x_np = get_data_batch(spatial_size, n_channels)
start_time = time.time()
for _ in range(n_iter):
z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
avr_time = (time.time() - start_time) / n_iter
print('z_np.shape', z_np.shape)
print('avr_time', round(avr_time, 3))
n_total_params = 0
for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
n_total_params += np.prod(v.get_shape().as_list())
print('Number of parameters:', format(n_total_params, ',d'))
# USING TENSORFLOW BENCHMARK
benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf,
feed_dict={x_tf: x_np}, burn_iters=2, min_iters=n_iter,
store_memory_usage=False, name='example')
return results
if __name__ == '__main__':
results = run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)
This implementation of a benchmarking class within the tensorflow library itself provides hints as to the answers to your other questions. As the tensorflow implementation does not necessitate use of a new feed_dict
for each benchmark iteration, it would appear that the answer to question 1) 'Is it ok that x_np
used in the loop is the same or I need to regenerate it each time?' is that it is OK to use the same x_np
each loop. In regards to question 2), it does appear that some 'warm up' is necessary. The default number of burn iterations suggested by the tensorflow library implementation is 2. In regards to question 3), timeit
is an excellent tool for measuring execution time of small code snippets. However, the tensorflow library itself uses time.time()
in a similar manner to what you have done: run_op_benchmark
(source). Interestingly, the tensorflow benchmark implementation reports back the median rather than the mean of the operation walltimes (presumably to make the benchmark more robust to outliers).