I have an expensive initialization op before I perform my calculation in tensorflow.
My code looks something like this:
x = tf.Variable(2.0)
w = tf.Variable(5.0)
with tf.GradientTape() as tape:
tape.watch(x)
tape.watch(w)
y = x ** 2
z = w ** 3
o = tf.math.log(y*z) # note that this step is the arbitrarily complex init code
# now i need to run a loop n times (here n is 10)
res = []
for i in range(10):
with tape:
z =tf.random.normal([1,10])
f = tf.reduce_sum(x*z, axis=1)*o+w
df = tape.gradient(f, {'x':x, 'w':w})
res.append(df)
Basically I'm trying to run a monte carlo simulation and need the gradients without having to run the initialization code on every loop. This code works fine if n==1 but gives the wrong answers if n>=2.
What i need is a way to copy the state of tape before I start the monte carlo loop. so instead of saying "with tape" something like:
with tf.GradientTape(tape) as tape2:
...
df = tape2.gradient(f, {'x':x, 'w':w})
Is this possible? How can I achieve something similar?
As a second part to the question, I've noticed that even if I recalculate the value of o in the main loop, tensorflow only works if the tape is not persistent. If it is - i run out of memory on the GPU after several iterations of the loop. This is not ideal as I'd like to also define other functions that depend on x and w and record their gradients also.
i.e. if i do this:
res = []
for i in range(10):
with tf.GradientTape(persistent=True) as tape:
z =tf.random.normal([1,10])
# rerun init for every loop
y = x ** 2
t = w ** 3
o = tf.math.log(y*t)
f = tf.reduce_sum(x*z, axis=1)*o+w
g = tf.reduce_sum(x*z+w, axis=1)*o
df = tape.gradient(f, {'x':x, 'w':w})
dg = tape.gradient(g, {'x':x, 'w':w})
res.append([df,dg])
I don't understand this behavior - surely the tape is being discarded after each iteration of the loop (and thus it shouldn't matter if it was persistent or not)?
Thanks
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…