python - How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

Question

Welcome To Ask or Share your Answers For Others

python - How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

Clarifications:

This question is regarding this QuickDraw RNN Drawing classification tensorflow tutorial, not the text RNN tensorflow tutorial
To an extend this is a duplicate of Farooq Khan's question, however I could use a few more specific details (that otherwise easily become cumbersome comments) and take the opportunity to reward Farooq for taking his time to provide further details.

I'm running tensorflow 1.6.0-rc0 compiled from source with GPU support on a Macbook with an NVIDIA GeForce GT 750M 2048 MB GPU.

I've attempted to train like so:

python train_model.py --model_dir=./model_gpu --training_data=./rnn_tutorial_data/training.tfrecord-00000-of-00010 --eval_data=./rnn_tutorial_data/eval.tfrecord-00000-of-00010 --classes_file=./rnn_tutorial_data/training.tfrecord.classes --cell_type=cudnn_lstm

The initial clarifications I'm looking for are:

should I use the command above, then once it completes run :python train_model.py --model_dir=./model_gpu --training_data=./rnn_tutorial_data/training.tfrecord-00001-of-00010 --eval_data=./rnn_tutorial_data/eval.tfrecord-00001-of-00010 --classes_file=./rnn_tutorial_data/training.tfrecord.classes --cell_type=cudnn_lstm through to python train_model.py --model_dir=./model_gpu --training_data=./rnn_tutorial_data/training.tfrecord-00009-of-00010 --eval_data=./rnn_tutorial_data/eval.tfrecord-00009-of-00010 --classes_file=./rnn_tutorial_data/training.tfrecord.classes --cell_type=cudnn_lstm or should I run the command mentioned in the tutorial as-is: python train_model.py --training_data=rnn_tutorial_data/training.tfrecord-?????-of-????? --eval_data=rnn_tutorial_data/eval.tfrecord-?????-of-????? --classes_file=rnn_tutorial_data/training.tfrecord.classes
- how do I know when training is complete ? (these are the last messages from the last training session: 2018-04-11 01:43:27.180805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1410] Adding visible gpu devices: 0 2018-04-11 01:43:27.180860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-11 01:43:27.180866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-11 01:43:27.180869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-11 01:43:27.180950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1021] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 100 MB memory) -> physical GPU (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0, compute capability: 3.0) No error or any other output afterwards: hard to distinguish those messages from other previous checkpoints)
- how to pass a custom doodle for classification ? This is the core of my question. Farooq's create_tfrecord_for_prediction in his answer is great: a full script to run/test would be amazing

Update2

Thanks to Farooq's helpful notes, here is a tweaked version of the code that prints a prediction to console:

# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

r"""Binary for trianing a RNN-based classifier for the Quick, Draw! data.

python train_model.py 
  --training_data train_data 
  --eval_data eval_data 
  --model_dir /tmp/quickdraw_model/ 
  --cell_type cudnn_lstm

When running on GPUs using --cell_type cudnn_lstm is much faster.

The expected performance is ~75% in 1.5M steps with the default configuration.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import functools
import sys

from datetime import datetime
import json
import numpy as np


import tensorflow as tf


def get_num_classes():
  classes = []
  with tf.gfile.GFile(FLAGS.classes_file, "r") as f:
    classes = [x for x in f]
  num_classes = len(classes)
  return num_classes


def get_input_fn(mode, tfrecord_pattern, batch_size):
  """Creates an input_fn that stores all the data in memory.

  Args:
   mode: one of tf.contrib.learn.ModeKeys.{TRAIN, INFER, EVAL}
   tfrecord_pattern: path to a TF record file created using create_dataset.py.
   batch_size: the batch size to output.

  Returns:
    A valid input_fn for the model estimator.
  """

  def _parse_tfexample_fn(example_proto, mode):
    """Parse a single record which is expected to be a tensorflow.Example."""
    feature_to_type = {
        "ink": tf.VarLenFeature(dtype=tf.float32),
        "shape": tf.FixedLenFeature([2], dtype=tf.int64)
    }
    if mode != tf.estimator.ModeKeys.PREDICT:
      # The labels won't be available at inference time, so don't add them
      # to the list of feature_columns to be read.
      feature_to_type["class_index"] = tf.FixedLenFeature([1], dtype=tf.int64)

    parsed_features = tf.parse_single_example(example_proto, feature_to_type)
    parsed_features["ink"] = tf.sparse_tensor_to_dense(parsed_features["ink"])

    if mode != tf.estimator.ModeKeys.PREDICT:
      labels = parsed_features["class_index"]
      return parsed_features, labels
    else:
      return parsed_features  # In prediction, we have no labels

  def _input_fn():
    """Estimator `input_fn`.

    Returns:
      A tuple of:
      - Dictionary of string feature name to `Tensor`.
      - `Tensor` of target labels.
    """
    dataset = tf.data.TFRecordDataset.list_files(tfrecord_pattern)
    if mode == tf.estimator.ModeKeys.TRAIN:
      dataset = dataset.shuffle(buffer_size=10)
    dataset = dataset.repeat()
    # Preprocesses 10 files concurrently and interleaves records from each file.
    dataset = dataset.interleave(
        tf.data.TFRecordDataset,
        cycle_length=10,
        block_length=1)
    dataset = dataset.map(
        functools.partial(_parse_tfexample_fn, mode=mode),
        num_parallel_calls=10)
    dataset = dataset.prefetch(10000)
    if mode == tf.estimator.ModeKeys.TRAIN:
      dataset = dataset.shuffle(buffer_size=1000000)
    # Our inputs are variable length, so pad them.
    dataset = dataset.padded_batch(
        batch_size, padded_shapes=dataset.output_shapes)

    iter = dataset.make_one_shot_iterator()
    if mode != tf.estimator.ModeKeys.PREDICT:
        features, labels = iter.get_next()
        return features, labels
    else:
        features = iter.get_next()
        return features, None  # In prediction, we have no labels

  return _input_fn


def model_fn(features, labels, mode, params):
  """Model function for RNN classifier.

  This function sets up a neural network which applies convolutional layers (as
  configured with params.num_conv and params.conv_len) to the input.
  The output of the convolutional layers is given to LSTM layers (as configured
  with params.num_layers and params.num_nodes).
  The final state of the all LSTM layers are concatenated and fed to a fully
  connected layer to obtain the final classification scores.

  Args:
    features: dictionary with keys: inks, lengths.
    labels: one hot encoded classes
    mode: one of tf.estimator.ModeKeys.{TRAIN, INFER, EVAL}
    params: a parameter dictionary with the following keys: num_layers,
      num_nodes, batch_size, num_conv, conv_len, num_classes, learning_rate.

  Returns:
    ModelFnOps for Estimator API.
  """

  def _get_input_tensors(features, labels):
    """Converts the input dict into inks, lengths, and labels tensors."""
    # features[ink] is a sparse tensor that is [8, batch_maxlen, 3]
    # inks will be a dense tensor of [8, maxlen, 3]
    # shapes is [batchsize, 2]
    shapes = features["shape"]
    # lengths will be [batch_size]
    lengths = tf.squeeze(
        tf.slice(shapes, begin=[0, 0], size=[params.batch_size, 1]))
    inks = tf.reshape(features["ink"], [params.batch_size, -1, 3])
    if labels is not None:
      labels = tf.squeeze(labels)
    return inks, lengths, labels

  def _add_conv_layers(inks, lengths):
    """Adds convolution layers."""
    convolved = inks
    for i in range(len(params.num_conv)):
      convolved_input = convolved
      if params.batch_norm:
        convolved_input = tf.layers.batch_normalization(
            convolved_input,
            training=(mode == tf.estimator.ModeKeys.TRAIN))
      # Add dropout layer if enabled and not first convolution layer.
      if i > 0 and params.dropout:
        convolved_input = tf.layers.dropout(
            convolved_input,
            rate=params.dropout,
            training=(mode == tf.estimator.ModeKeys.TRAIN))
      convolved = tf.layers.conv1d(
          convolved_input,
          filters=params.num_conv[i],
          kernel_size=params.conv_len[i],
          activation=None,
          strides=1,
          padding="same",
          name="conv1d_%d" % i)
    return convolved, lengths

  def _add_regular_rnn_layers(convolved, lengths):
    """Adds RNN layers."""
    if params.cell_type == "lstm":
      cell = tf.nn.rnn_cell.BasicLSTMCell
    elif params.cell_type == "block_lstm":
      cell = tf.contrib.rnn.LSTMBlockCell
    cells_fw = [cell(params.num_nodes) for _ in range(params.num_layers)]
    cells_bw = [cell(params.num_nodes) for _ in range(params.num_layers)]
    if params.dropout > 0.0:
      cells_fw = [tf.contrib.rnn.DropoutWrapper(cell) for cell in cells_fw]
      cells_bw = [tf.contrib.rnn.DropoutWrapper(cell) for cell in cells_bw]
    outputs, _, _ = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
        cells_fw=cells_fw,
        cells_bw=cells_bw,
        inputs=convolved,
        sequence_length=lengths,
        dtype=tf.float32,
        scope="rnn_classification")
    return outputs

  def _add_cudnn_rnn_layers(convolved):
    """Adds CUDNN LSTM layers."""
    # Convolutions output [B, L, Ch], while CudnnLSTM is time-major.
    convolved = tf.transpose(convolved, [1, 0, 2])
    lstm = tf.contrib.cudnn_rnn.CudnnLSTM(
        num_layers=params.num_layers,
        num_units=params.num_nodes,
        dropout=params.dropout if mode == tf.estimator.ModeKeys.TRAIN else 0.0,
        direction="bidirectional")
    outputs, _ = lstm(convolved)
    # Convert back from time-major outputs to batch-major outputs.
    outputs = tf.transpose(outputs, [1, 0, 2])
    return outputs

  def _add_rnn_layers(convolved, lengths):
    """Adds recurrent neural network layers depending on the cell type."""
    if params.cell_type != "cudnn_lstm":
      outputs = _add_regular_rnn_layers(convolved, lengths)
    else:
      outputs = _add_cudnn_rnn_layers(convolved)
    # outputs is [batch_size, L, N] where L is the

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:35:34+0000

python train_model.py --training_data=rnn_tutorial_data/training.tfrecord-?????-of-????? --eval_data=rnn_tutorial_data/eval.tfrecord-?????-of-????? --classes_file=rnn_tutorial_data/training.tfrecord.classes

AFAIK using the above command works as well, it will simply read all the files in the folder where you have download the data files one by one.

create_tfrecord_for_prediction is certainly not my own creation, this code was mostly picked from another file from tensorflow guys create_dataset.py

Below I have pasted the almost all of the new code i added including the my modifications to the main() function

def create_tfrecord_for_prediction(batch_size, stoke_data, tfrecord_file):
    def parse_line(stoke_data):
        """Parse provided stroke data and ink (as np array) and classname."""
        inkarray = json.loads(stoke_data)
        stroke_lengths = [len(stroke[0]) for stroke in inkarray]
        total_points = sum(stroke_lengths)
        np_ink = np.zeros((total_points, 3), dtype=np.float32)
        current_t = 0
        for stroke in inkarray:
            if len(stroke[0]) != len(stroke[1]):
                print("Inconsistent number of x and y coordinates.")
                return None
            for i in [0, 1]:
                np_ink[current_t:(current_t + len(stroke[0])), i] = stroke[i]
            current_t += len(stroke[0])
            np_ink[current_t - 1, 2] = 1  # stroke_end
        # Preprocessing.
        # 1. Size normalization.
        lower = np.min(np_ink[:, 0:2], axis=0)
        upper = np.max(np_ink[:, 0:2], axis=0)
        scale = upper - lower
        scale[scale == 0] = 1
        np_ink[:, 0:2] = (np_ink[:, 0:2] - lower) / scale
        # 2. Compute deltas.
        #np_ink = np_ink[1:, 0:2] - np_ink[0:-1, 0:2]
        #np_ink = np_ink[1:, :]
        np_ink[1:, 0:2] -= np_ink[0:-1, 0:2]
        np_ink = np_ink[1:, :]

        features = {}
        features["ink"] = tf.train.Feature(float_list=tf.train.FloatList(value=np_ink.flatten()))
        features["shape"] = tf.train.Feature(int64_list=tf.train.Int64List(value=np_ink.shape))
        f = tf.train.Features(feature=features)
        ex = tf.train.Example(features=f)
        return ex

    if stoke_data is None:
        print("Error: Stroke data cannot be none")
        return

    example = parse_line(stoke_data)

    #Remove the file if it already exists
    if tf.gfile.Exists(tfrecord_file):
        tf.gfile.Remove(tfrecord_file)

    writer = tf.python_io.TFRecordWriter(tfrecord_file)
    for i in range(batch_size):
        writer.write(example.SerializeToString())
    writer.flush()
    writer.close()

def get_classes():
  classes = []
  with tf.gfile.GFile(FLAGS.classes_file, "r") as f:
    classes = [x.rstrip() for x in f]
  return classes

def main(unused_args):
  print("%s: I Starting application" % (datetime.now()))

  estimator, train_spec, eval_spec = create_estimator_and_specs(
      run_config=tf.estimator.RunConfig(
          model_dir=FLAGS.model_dir,
          save_checkpoints_secs=300,
          save_summary_steps=100))
  tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

  if FLAGS.predict_for_data != None:
      print("%s: I Starting prediction" % (datetime.now()))
      class_names = get_classes()
      create_tfrecord_for_prediction(FLAGS.batch_size, FLAGS.predict_for_data, FLAGS.predict_temp_file)
      predict_results = estimator.predict(input_fn=get_input_fn(
          mode=tf.estimator.ModeKeys.PREDICT,
          tfrecord_pattern=FLAGS.predict_temp_file,
          batch_size=FLAGS.batch_size))

      #predict_results = estimator.predict(input_fn=predict_input_fn)
      for idx, prediction in enumerate(predict_results):
          index = prediction["class_index"]  # Get the predicted class (index)
          probability = prediction["probabilities"][index]
          class_name = class_names[index]
          print("%s: Predicted Class is: %s with a probability of %f" % (datetime.now(), class_name, probability))
          break #We care for only the first prediction, rest are all duplicates just to meet the batch size

FLAGS.predict_for_data this is the command line argument that holds the strokes data FLAGS.predict_temp_file is just a name of file i use to create the temporary input data tfrecord file

Note1 : Along with this I also modified some code in get_input_fn() you can find this code change in this PR: https://github.com/tensorflow/models/pull/3440 (has not been merged yet)

Note2: I also had to modify model_fn() and add the below few lines my additions are after the comment #Compute current predictions

  # Build the model.
  inks, lengths, labels = _get_input_tensors(features, labels)
  convolved, lengths = _add_conv_layers(inks, lengths)
  final_state = _add_rnn_layers(convolved, lengths)
  logits = _add_fc_layers(final_state)

  # Compute current predictions.
  predictions = tf.argmax(logits, axis=1)

  if mode == tf.estimator.ModeKeys.PREDICT:
      preds = {
          "class_index": predictions,
          #"class_index": predictions[:, tf.newaxis],
          "probabilities": tf.nn.softmax(logits),
          "logits": logits
      }
      #preds = {"logits": logits, "predictions": predictions}

      return tf.estimator.EstimatorSpec(mode, predictions=preds)
      # Add the loss.
  cross_entropy = tf.reduce_mean(
      tf.nn.sparse_softmax_cross_entropy_with_logits(
          labels=labels, logits=logits))

  # Add the optimizer.
  train_op = tf.contrib.layers.optimize_loss(
      loss=cross_entropy,
      global_step=tf.train.get_global_step(),
      learning_rate=params.learning_rate,
      optimizer="Adam",
      # some gradient clipping stabilizes training in the beginning.
      clip_gradients=params.gradient_clipping_norm,
      summaries=["learning_rate", "loss", "gradients", "gradient_norm"])

  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions={"logits": logits, "predictions": predictions},
      loss=cross_entropy,
      train_op=train_op,
      eval_metric_ops={"accuracy": tf.metrics.accuracy(labels, predictions)})

The only thing left is then to figure out generating strokes data. For this you can take one of the existing tfrecord file read it and extract a stroke from that read operation or you could write some javascript webpage to generate strokes

Categories

python - How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

python - How to classify a QuickDraw doodle using TensorFlow's sketch RNN tutorial?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags