Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
734 views
in Technique[技术] by (71.8m points)

tensorflow - feed data into a tf.contrib.data.Dataset like a queue

About the tf.contrib.data.Dataset (from TensorFlow 1.2, see here and here) usage: The way how to get data doesn't really fit any way how I get the data usually. In my case, I have a thread and I receive data there and I don't know in advance when it will end but I see when it ends. Then I wait until I processed all the buffers and then I have finished one epoch. How can I get this logic with the Dataset?

Note that I prefer the Dataset interface over the QueueBase interface because it gives me the iterator interface which I can reinitialize and even reset to a different Dataset. This is more powerful compared to queues which cannot be reopened currently after they are closed (see here and here).

Maybe a similar question, or the same question: How can I wrap around a Dataset over a queue? I have some thread with reads some data from somewhere and which can feed it and queue it somehow. How do I get the data into the Dataset? I could repeat some dummy tensor infinite times and then use map to just return my queue.dequeue() but that really only gets me back to all the original problems with the queue, i.e. how to reopen the queue.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The new Dataset.from_generator() method allows you to define a Dataset that is fed by a Python generator. (To use this feature at present, you must download a nightly build of TensorFlow or build it yourself from source. It will be part of TensorFlow 1.4.)

The easiest way to implement your example would be to replace your receiving thread with a generator, with pseudocode as follows:

def receiver():
  while True:
    next_element = ...  # Receive next element from external source.
                        # Note that this method may block.

    end_of_epoch = ...  # Decide whether or not to stop based on next_element.

    if not end_of_epoch:
      yield next_element  # Note: you may need to convert this to an array.
    else:
      return  # Returning will signal OutOfRangeError on downstream iterators.

dataset = tf.contrib.data.Dataset.from_generator(receiver, output_types=...)

# You can chain other `Dataset` methods after the generator. For example:
dataset = dataset.prefetch(...)  # This will start a background thread
                                 # to prefetch elements from `receiver()`.

dataset = dataset.repeat(...)  # Note that each repetition will call
                               # `receiver()` again, and start from
                               # a fresh state.

dataset = dataset.batch(...)

More complicated topologies are possible. For example, you can use Dataset.interleave() to create many receivers in parallel.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...