cudaカーネルの推力

私のマシン（Linux SL7）にもcuda 8.0がインストールされていますが、推力1.8.1をダウンロードして既存の推力ライブラリを新しい1.8.1に置き換えました。cudaカーネルの推力

私が推力1.8から始めることを知っている限り、推力はサポートされ、穀粒で使用することができます。私は彼らのウェブサイトから引用：

Thrust 1.8.0 introduces support for algorithm invocation from CUDA __device__ code, support for CUDA streams, and algorithm performance improvements. Users may now invoke Thrust algorithms from CUDA __device__ code

しかし、私はNsightの日食を使用してアプリケーションをビルドするとき、それは私に、このエラーを示しています

calling a __host__ function("thrust::sort") from a __global__ function("mykernel") is not allowed.

はどんなアドバイスをしてください？

#include <iostream> 
#include <numeric> 
#include <stdlib.h> 
#include <stdio.h> 
#include <cuda_runtime.h> 
#include <cuda.h> 
#include <thrust/sort.h> 
#include <thrust/execution_policy.h> 

__global__ void mykernel(int* a, int* b) 
{ 

thrust::sort(a, a + 10); 
} 

int main(void) 
{ 
    int a[10] = { 0, 9, 7, 3, 1, 6, 4, 5, 2, 8 }; 
    int b[10]; 
    int *d_a, *d_c; 

    cudaMalloc((void**)&d_a, 10 * sizeof(int)); 
    cudaMalloc((void**)&d_c, 10 * sizeof(int)); 

    std::cout << "A\n"; 
    for (int i = 0; i < 10; ++i) { 
     std::cout << a[i] << " "; 
    } 

    cudaMemcpy(d_a, a, 10 * sizeof(int), cudaMemcpyHostToDevice); 
    mykernel<<<1, 1> > >(d_a, d_c); 
    cudaMemcpy(a, d_c, 10 * sizeof(int), cudaMemcpyDeviceToHost); 
    std::cout << "\nA\n"; 
    for (int i = 0; i < 10; ++i) { 
     std::cout << a[i] << " "; 
    } 

    cudaFree(d_a); 
    cudaFree(d_c); 
    return 0; 
}

出典

2017-02-06 Emad R

[ユーザーの書かれたカーネル内の推力]の複製が可能です。（http://stackoverflow.com/questions/5510715/thrust-inside-user-written-kernels） – Soeren

あなたが正しいです：

は、ここに私のコードです。推力1.8以降は、デバイスコード内でアルゴリズムコールをサポートします。ただし、これを利用するには、新しいexecution policiesを使用して、ライブラリをデバイスコードで正しく動作させる必要があります。

あなたは、このような実行ポリシーを含んsortのバージョンを使用する場合：

__global__ void mykernel(int* a, int* b) 
{ 
    thrust::sort(thrust::device, a, a + 10); 
}

を使用すると、コードが正しくコンパイル見つける必要があります。

出典

2017-02-06 19:23:39 talonmies

ありがとうございます。 –

cudaカーネルの推力

答えて

関連する問題