इस धागे में एक उत्तर के अनुसार ( OpKernel पर NotFoundError tf.nn.embedding_lookup को टेंसरफ़्लो उत्सुक मोड में उपयोग करते समय) कुछ ऑप्स अभी तक GPU पर लागू नहीं किए गए हैं।

मुझे एक ऑप के साथ समस्या है, जहां मुझे एक NotFoundError भी मिलता है, लेकिन त्रुटि-संदेश मुझे भ्रमित करता है। यहाँ मेरा नमूना-कोड Tensorflow 1.10 के साथ है। मुझे पता है कि मैं डिवाइस-फोर्सिंग को छोड़ सकता हूं और टेंसरफ्लो सीपीयू पर ऑपरेशन चलाएगा, लेकिन मैं जितना संभव हो उतना जीपीयू पर करना चाहता हूं।

import tensorflow as tf

tf.enable_eager_execution()
print("Eager execution: {}".format(tf.executing_eagerly()))

device = 'gpu:0'
with tf.device(device):

    x = tf.constant([195330., 195075., 173910., 167535., 167535., 170340., 206040., 175185., 206040.,
                     118575., 214710., 171870., 204765., 202215.,      0.,      0.,      0.,      0.,
                     0.,      0.], dtype=tf.float32)

    print(tf.count_nonzero(x))

मुझे निम्नलिखित त्रुटि मिलती है:

python3 test.py 
Eager execution: True
2018-09-28 14:41:51.186066: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: FMA
2018-09-28 14:41:51.370081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.38GiB
2018-09-28 14:41:51.467475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties: 
name: GeForce GT 730 major: 3 minor: 5 memoryClockRate(GHz): 0.9015
pciBusID: 0000:02:00.0
totalMemory: 1.96GiB freeMemory: 1.93GiB
2018-09-28 14:41:51.467534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1469] Ignoring visible gpu device (device: 1, name: GeForce GT 730, pci bus id: 0000:02:00.0, compute capability: 3.5) with Cuda multiprocessor count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2018-09-28 14:41:51.467543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-28 14:41:51.848119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-28 14:41:51.848172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 1 
2018-09-28 14:41:51.848195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N N 
2018-09-28 14:41:51.848206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1:   N N 
2018-09-28 14:41:51.848446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5143 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    print(tf.count_nonzero(x))
  File "/home/joe/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/home/joe/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1384, in count_nonzero
    reduction_indices=reduction_indices),
  File "/home/joe/.local/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/home/joe/.local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1307, in reduce_sum
    name=name))
  File "/home/joe/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8283, in _sum
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.NotFoundError: No registered 'Sum' OpKernel for GPU devices compatible with node Sum = Sum[T=DT_INT64, Tidx=DT_INT32, keep_dims=false](dummy_input, dummy_input)
     (OpKernel was found, but attributes didn't match)
    .  Registered:  device='CPU'; T in [DT_COMPLEX128]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_COMPLEX128]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_COMPLEX64]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_COMPLEX64]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_FLOAT]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_BFLOAT16]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_BFLOAT16]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_HALF]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_HALF]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_INT8]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_INT8]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_UINT8]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_UINT8]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_INT16]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_INT16]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_UINT16]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_UINT16]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_INT32]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_INT32]; Tidx in [DT_INT32]
  device='CPU'; T in [DT_INT64]; Tidx in [DT_INT64]
  device='CPU'; T in [DT_INT64]; Tidx in [DT_INT32]
  device='GPU'; T in [DT_INT32]; Tidx in [DT_INT64]
  device='GPU'; T in [DT_INT32]; Tidx in [DT_INT32]
  device='GPU'; T in [DT_COMPLEX128]; Tidx in [DT_INT64]
  device='GPU'; T in [DT_COMPLEX128]; Tidx in [DT_INT32]
  device='GPU'; T in [DT_COMPLEX64]; Tidx in [DT_INT64]
  device='GPU'; T in [DT_COMPLEX64]; Tidx in [DT_INT32]
  device='GPU'; T in [DT_DOUBLE]; Tidx in [DT_INT64]
  device='GPU'; T in [DT_DOUBLE]; Tidx in [DT_INT32]
  device='GPU'; T in [DT_FLOAT]; Tidx in [DT_INT64]
  device='GPU'; T in [DT_FLOAT]; Tidx in [DT_INT32]
  device='GPU'; T in [DT_HALF]; Tidx in [DT_INT64]
  device='GPU'; T in [DT_HALF]; Tidx in [DT_INT32]
 [Op:Sum]

जहाँ तक मैं त्रुटि को समझता हूँ

No registered 'Sum' OpKernel for GPU devices compatible with node Sum = Sum[T=DT_INT64, Tidx=DT_INT32, keep_dims=false](dummy_input, dummy_input)

यह T=DT_INT64, Tidx=DT_INT32 के लिए एक कार्यान्वयन की तलाश में है, लेकिन टेंसर float32 प्रकार का है। क्या मुझे कुछ याद आ रहा है?

0
kielnino 28 सितंबर 2018, 15:10

2 जवाब

सबसे बढ़िया उत्तर

कार्यान्वयन को यहां देखें:

def count_nonzero(input_tensor,
                  axis=None,
                  keepdims=None,
                  dtype=dtypes.int64,
                  name=None,
                  reduction_indices=None,
keep_dims=None):
  keepdims = deprecation.deprecated_argument_lookup("keepdims", keepdims,
                                                    "keep_dims", keep_dims)
  if keepdims is None:
    keepdims = False

  with ops.name_scope(name, "count_nonzero", [input_tensor]):
    input_tensor = ops.convert_to_tensor(input_tensor, name="input_tensor")
    # A scalar of 'zero' is enough as `not_equal` will broadcast.
    zero = array_ops.zeros([], dtype=input_tensor.dtype)
    return cast(
        reduce_sum(
            # int64 reduction happens on GPU
            to_int64(gen_math_ops.not_equal(input_tensor, zero)),
            axis=axis,
            keepdims=keepdims,
            reduction_indices=reduction_indices),
dtype=dtype)

ध्यान दें कि reduce_sum को कॉल करने से पहले int64 में कास्ट कैसे होता है। यही कारण है कि टीएफ आपके मूल फ्लोट 32 के बजाय int64 पर एक ऑपरेशन की खोज करता है।

0
FlyingTeller 28 सितंबर 2018, 16:00

मैंने count_nonzero को अधिक और reduce_sum के संयोजन से प्रतिस्थापित किया है (बूलियन सरणी को ग्रेटर-ऑप से फ़्लोट32 में कास्ट करना)। अब यह GPU पर काम करता है:

print(tf.reduce_sum(tf.cast(tf.greater(x, 0), tf.float32)))
1
kielnino 30 सितंबर 2018, 10:42