程序代写 CUDA Lab 2. Thread configuration – cscodehelp代写

1. Understand how to calculate global index of a threads in 1D blocks and 1D grids
2. Learn how to organize and specify threads in 2D and 3D blocks
3. Learn how to organize and specify blocks in 2D and 3D grids

4. Understand how to calculate global index of a threads in 2D and 3D blocks and grids
Continue to work on the CUDA sample on vector addition used in the first lab, where only one 1D block of threads is used.
Now consider using multiple 1D thread blocks, say, two or three thread blocks, with the number of threads per block being 3, 4, 5, 6, ….
addKernel << <2, 3 >> > (dev_c, dev_a, dev_b); addKernel << <2, 4 >> > (dev_c, dev_a, dev_b); addKernel << <2, 5 >> > (dev_c, dev_a, dev_b); addKernel << <2, 6 >> > (dev_c, dev_a, dev_b); addKernel << <3, 2 >> > (dev_c, dev_a, dev_b);
Observe the results calculated based on different thread configurations shown above. You may find that the addition may not always be calculated properly for some thread configurations.
Exercise 1. Understand the block and thread indices
List the values for the built-in variables threadIdx.x and blockIdx.x corresponding to the following thread configurations used for executing the kernel addKernel( ) function on GPU:
1) addKernel << <1, 5 >> > (dev_c, dev_a, dev_b);
2) addKernel << <2, 3 >> > (dev_c, dev_a, dev_b);
3) addKernel << <3, 2 >> > (dev_c, dev_a, dev_b);
4) addKernel << <6, 1 >> > (dev_c, dev_a, dev_b);
For the vector addition problem considered in the CUDA template, find the solution based on the following thread configurations by modifying the following line of CUDA code:
1) addKernel << <2, 3 >> > (dev_c, dev_a, dev_b);
2) addKernel << <3, 2 >> > (dev_c, dev_a, dev_b);
3) addKernel << <6, 1 >> > (dev_c, dev_a, dev_b);
Exercise 3. Understand the thread indices for 2D blocks
List the values for the built-in variables threadIdx.x and threadIdx.y corresponding to the following thread configurations used for executing the kernel addKernel( ) function on GPU:

1) addKernel << <1, dim3(2, 3) >> > (dev_c, dev_a, dev_b);
2) addKernel << <1, dim3(3, 3) >> > (dev_c, dev_a, dev_b);
3) addKernel << <1, dim3(5, 1) >> > (dev_c, dev_a, dev_b);
For the vector addition problem considered in the CUDA template, find the solution based on the following thread configurations by modifying the following line of CUDA code:
1) addKernel << <1, dim3(2, 3) >> > (dev_c, dev_a, dev_b);
2) addKernel << <1, dim3(3, 2) >> > (dev_c, dev_a, dev_b);
3) addKernel << <1, dim3(5, 1) >> > (dev_c, dev_a, dev_b);