この問題にどのようにアプローチするか、何を検索するかはわかりませんが、GPUでコードを実行するとInvalidValueExceptionがスローされます可変状態を追跡するtf.train.Saverオブジェクト。 Saverのインスタンス化をコメントアウトするか、CPU:0に切り替えると、コードはうまく動作します。GPUでtf.train.Saver()を使用したTensorflowクラッシュ
File "entrypoint.py", line 496, in <module>
online_mvrcca_multipie_test3()
File "entrypoint.py", line 490, in online_mvrcca_multipie_test3
gs_res = gridsearch_optimizer_cb(parameter_ranges,exp_f_handle);
File "/homes/sj16/LPLUSS/deps/sjpy_utils/exptools/parameter_search.py", line 48, in gridsearch_optimizer_async
f_handle(parameter_instance);
File "entrypoint.py", line 487, in <lambda>
{}\
File "/homes/sj16/LPLUSS/deps/pyena/src/sessions.py", line 115, in submit_to_local_session
run_metadata_ptr)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'save/Const': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Identity: CPU
Const: CPU
[[Node: save/Const = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: model>, _device="/device:GPU:0"]()]]
Caused by op u'save/Const', defined at:
File "entrypoint.py", line 496, in <module>
online_mvrcca_multipie_test3()
File "entrypoint.py", line 490, in online_mvrcca_multipie_test3
gs_res = gridsearch_optimizer_cb(parameter_ranges,exp_f_handle);
File "/homes/sj16/LPLUSS/deps/sjpy_utils/exptools/parameter_search.py", line 48, in gridsearch_optimizer_async
f_handle(parameter_instance);
File "entrypoint.py", line 487, in <lambda>
{}\
File "/homes/sj16/LPLUSS/deps/pyena/src/sessions.py", line 115, in submit_to_local_session
worker_result=worker_task(*worker_args);
File "/homes/sj16/LPLUSS/src/experiments/matrix_reconstruction/online/mvrcca_online/image_exp/experiment_workers.py", line 41, in batch_mv_recon_test_mc7
saver = tf.train.Saver() #Here is the offending call to Saver(), having set up the graph
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 845, in __init__
restore_sequentially=restore_sequentially)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 504, in build
filename_tensor = constant_op.constant("model")
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 166, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()
私はTFはあなたがGPUモードにしている場合は、チェックポイントファイルにtf.constantを保存する方法がないように見えるように私には意味ですか? save/Constノード(定数を保存する)を実行するには、 "カーネル"のGPU実装はありません(このコンテキストでは何を意味するのかはわかりません)。少し奇妙になり
...という名前の定数保存し、復元することができない...
はさらに、私はtf.constant()
を使用することはありませんが、私はあなたがtf.convert_to_tensor
を呼び出したときに一定のノードが作成されて推測しています数値/数値変数を使用しますか?
-----------編集最小の例を示す-----
環境:
CUDA 7.5.18テスラK40c/W。 Ubuntu 14.04;
import os,math
import operator as op
import tensorflow as tf
with tf.device('/gpu:0'):
tf_session=tf.Session()
exp_model_dir= os.path.join(os.path.expanduser("~"),'tf_scratchpad/saver_failure_dense_only')
if not os.path.isdir(exp_model_dir):
os.mkdir(exp_model_dir)
ranklim=10
dense_widths=[64,ranklim,64, 128]
# input to the network
input_data = tf.placeholder(tf.float32, [1,128], name='input_data')
current_input = input_data
for layer_i, n_output in enumerate(dense_widths[0:]):
n_input = int(current_input.get_shape()[1])
W = tf.Variable(
tf.random_uniform([n_input, n_output],
-1.0/math.sqrt(n_input),
1.0/math.sqrt(n_input)))
b = tf.Variable(tf.zeros([n_output]))
output = tf.nn.relu(tf.matmul(current_input, W) + b)
current_input = output
# reconstruction through the network
y = current_input
cost = tf.reduce_sum(tf.square(y - input_data))
train_writer = tf.train.SummaryWriter(os.path.join(exp_model_dir,'train'),
tf_session.graph)
optimizer = tf.train.GradientDescentOptimizer(0.0075).minimize(cost)
saver = tf.train.Saver()
tf_session.run(tf.initialize_all_variables())
のpython 2.7 miniconda環境を使用してGPU Tensorflow 0.9.0rc0は、生産:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:05:00.0
Total memory: 11.25GiB
Free memory: 11.15GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x2a95d80
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: Quadro K600
major: 3 minor: 0 memoryClockRate (GHz) 0.8755
pciBusID 0000:04:00.0
Total memory: 1023.31MiB
Free memory: 425.00MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:793] Ignoring gpu device (device: 1, name: Quadro K600, pci bus id: 0000:04:00.0) with Cuda multiprocessor count: 1. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
Traceback (most recent call last):
File "tfcrash.py", line 48, in <module>
tf_session.run(tf.initialize_all_variables())
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 372, in run
run_metadata_ptr)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 636, in _run
feed_dict_string, options, run_metadata)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 708, in _do_run
target_list, options, run_metadata)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 728, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'save/Const': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Identity: CPU
Const: CPU
[[Node: save/Const = Const[dtype=DT_STRING, value=Tensor<type: string shape: [] values: model>, _device="/device:GPU:0"]()]]
Caused by op u'save/Const', defined at:
File "tfcrash.py", line 46, in <module>
saver = tf.train.Saver()
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 845, in __init__
restore_sequentially=restore_sequentially)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 504, in build
filename_tensor = constant_op.constant("model")
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/ops/constant_op.py", line 166, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/homes/sj16/miniconda/envs/tensorflow27/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1230, in __init__
self._traceback = _extract_stack()
エラーが実際に)(initialize_all_variablesでスローされますが、tf.train.Saverへの呼び出しのせいにされます()。 Saver()呼び出しをコメントアウトするか、 '/ cpu:0'を使用すると、例外が防止されます。
'Saver'は変数を保存するだけですが、' Const'はノードです。これは既にGraphDefに保存されています。これは 'import_graph_def'で復元すると復元されます。 –
Yaroslavに感謝します。 Okie dokie .....だから単純なSessionオブジェクトを作成してグラフを作成し、tf.train.Saver()を呼び出すと、クラッシュしないようにするにはどうすればいいですか? 私はSaverが変数を保存するだけだということを知りました。そして、エラーはConstがノードであると言うので、私はそれほどです。そして、グラフ構造を復元するためのimport_graph_defを記述するスレッドがあります。 Buuut私のコードは、私が混乱しているSaverインスタンシエーションでクラッシュしています。コンスタンティールを何とかしようとしないように伝える必要がありますか? –
ここで書式を設定して申し訳ありませんが、5分または何かを編集することはできません。 –