{"id":190,"date":"2013-11-12T22:43:03","date_gmt":"2013-11-12T14:43:03","guid":{"rendered":"http:\/\/blog.stlover.org\/?p=190"},"modified":"2013-11-14T06:50:02","modified_gmt":"2013-11-13T22:50:02","slug":"cuda%e7%ae%80%e5%8d%95%e7%a7%91%e5%ad%a6%e8%bf%90%e7%ae%97%e5%b0%9d%e8%af%95%ef%bc%9amake-it-quick-and-clean","status":"publish","type":"post","link":"http:\/\/blog.xuhao1.me\/?p=190","title":{"rendered":"CUDA\u7b80\u5355\u79d1\u5b66\u8fd0\u7b97\u5c1d\u8bd5\uff1aMake It Quick and Clean!"},"content":{"rendered":"<p>\u4f7f\u547d\u53ec\u55245\u4e2d\uff0c\u5f53\u4e3b\u89d2\u6740\u5165\u4e1b\u6797\uff0c\u5bc2\u9759\u7684\u53ea\u6709\u6708\u5149\uff0c\u961f\u957f\u95fb\u5230\u4e86\u4e00\u4e1d\u4e0d\u5bf9\u52b2\u7684\u6c14\u606f\uff0c\u4e8e\u662f\u544a\u8bc9\u5927\u5175\u4eec\uff0c<\/p>\n<blockquote><p>make it quick and clean!<\/p><\/blockquote>\n<p>\u8fd9\u4e5f\u5dee\u4e0d\u591a\u662f\u5e76\u884c\u8fd0\u7b97\u7f16\u5199\u7684\u6839\u672c\u6240\u5728\uff0c\u5feb\u901f\uff0c\u5e72\u51c0\uff0c\u540c\u65f6\u4e00\u5b9a\u4e0d\u75591k\u810f\u5185\u5b58\u3002\u800c\u4e14\u5728\u591a\u6b21\u7684\u60e8\u75db\u6559\u8bad\u4e2d\u6d69\u6d69\u53d1\u73b0\uff0c\u5982\u679c\u4e0d\u628acuda\u7684device\u7a0b\u5e8f\u5199\u5e72\u51c0\u7684\u8bdd\uff0c\u4f60\u7684\u7535\u8111\u5f53\u7136\u4e0d\u4f1a\u6b7b\u673a\uff0ccpu\u7167\u5e38\u8fd0\u8f6c\uff0c\u53ea\u4e0d\u8fc7\uff0c\u5c4f\u5e55\u522b\u6307\u671b\u4ed6\u518d\u52a8\u4e86\u3002<\/p>\n<p><!--more--><\/p>\n<p><a title=\"CUDA\u5b66\u4e60\u76ee\u5f55\" href=\"http:\/\/blog.stlover.org\/?p=183\">\u76ee\u5f55<\/a><\/p>\n<p>\u8ba1\u7b97\u7269\u7406\u4f5c\u4e1a\u62d6\u4e86\u597d\u4e45\u4e5f\u6539\u5199\u4e86\u3002\u3002\u3002\u8fd8\u4e0d\u77e5\u9053\u600e\u4e48\u7ed9\u4e01\u8001\u5e08\u89e3\u91ca\u5462\uff0c\u5e72\u8106\u505a\u7684\u6f02\u4eae\u70b9\uff0c\u7528\u7528cuda\u6765\u7b97\u70b9\u5c0f\u4e1c\u897f\uff0c\u5e9f\u8bdd\u4e0d\u591a\u8bf4\uff0c\u6211\u4eec\u5c31\u7528cuda\u6765\u7b97\u79ef\u5206\u5427<\/p>\n<p>\u9996\u5148\u57fa\u4e8eMonte Carlo\u7684\u79ef\u5206\u8fd0\u7b97\u5b8c\u5168\u662f\u9760\u6570\u91cf\u62fc\u7cbe\u5ea6\uff0c\u867d\u8bf4\u6f02\u4eae\u7684\u62bd\u6837\u65b9\u6cd5\u53ef\u4ee5\u63d0\u9ad8\u8ba1\u7b97\u7684\u7cbe\u5ea6\uff0c\u4f46\u662f\u5bf9\u4e8e\u4e00\u4e2a3\u7ef4\u79ef\u5206\uff0c\u5982\u679c\u4f60\u4e00\u7ef4\u90fd\u8e29\u4e0d\u52301024\u4e2a\u70b9\u4f60\u597d\u610f\u601d\u8c08\u7cbe\u5ea6\uff1f<\/p>\n<p>\u4e0d\u8fc7\u53ef\u4ee5\u7b97\u4e00\u4e0b\u4e00\u7ef41024\u4e2a\u70b9\u662f\u4ec0\u4e48\u6982\u5ff5 \uff0c\u4e5f\u5c31\u662f$$2^{30}=1073741824$$\u4e2a\u70b9\u3002\u3002\u7136\u540e\u518d\u5962\u6c42\u4e0b\u9ad8\u7cbe\u5ea6\u3002\u7528CPU\u7b97\uff0c\u752d\u7ba1\u4f60\u662fmpi\u8fd8\u662f\u4ec0\u4e48\u3002\u3002\u60f3\u8981\u79d2\u89e3\u7684\u8001OIer\u53ef\u4ee5\u6d17\u6d17\u7761\u4e86\u3002<\/p>\n<p>\u4e8e\u662f\uff0c\u4eba\u751f\u82e6\u77ed\uff0cpython\u592a\u6162\uff0c\u6211\u4eec\u7528cuda(nvcc)\u3002<\/p>\n<p>\u5bf9\u6b64\uff0c\u6211\u5f52\u7ed3\u51fa\u6765\u4e00\u53e5\u8bdd:<\/p>\n<blockquote><p>\u79bb\u6548\u7387\u8d8a\u8fd1\uff0c\u79bb\u7075\u9b42\u8d8a\u8fdc\u3002<\/p><\/blockquote>\n<p>\u4f46\u662f\u4e3a\u4e86\u8ba1\u7b97\uff0c\u4e5f\u53ea\u597d\u628a\u7075\u9b42\u5356\u7ed9\u6307\u9488\u4ed6\u8001\u4eba\u5bb6\u4e86\u3002<\/p>\n<p>\u5176\u5b9eCUDA\u5728\u5f88\u591a\u65b9\u9762\u548cMonte Carlo\u7b80\u76f4\u662f\u4e00\u5bf9\uff0c\u6f02\u4eae\u7684\u9ad8\u5ea6\u5e76\u884c\u5316\uff0c\u5b8c\u7f8e\u7684\u6a21\u578b\uff0c\u4ee5\u53ca\uff0c\u9ad8\u5ea6\u7684\u968f\u673a\u6570\u652f\u6301\u3002<\/p>\n<p>Monte Carlo\u4e00\u7ef4\u79ef\u5206\u7684\u7b97\u6cd5\u5c31\u592a\u7b80\u5355\u4e86<\/p>\n<blockquote><p>\u968f\u673a\u53d6\u70b9\u53d6\u70b9\u53d6\u70b9\u3002\u3002\u3002\u7136\u540e\u7b97\u7b97\u7b97\u3002\u3002\u3002\u7136\u540e\u52a0\u8d77\u6765\u5e73\u5747\u5e73\u5747\u5e73\u5747\u3002\u3002\u3002<\/p><\/blockquote>\n<p>\u90a3\u4e48\uff0c\u5e9f\u8bdd\u5c11\u8bf4\uff0cCUDA\u8d70\u8d77\uff0c\u597d\uff0c\u4e3a\u4e86\u6781\u5927\u53ef\u80fd\u7684\u4f18\u5316\u6548\u7387\u6211\u4eec\u5148\u8003\u8651\u8fd9\u6837\u4e00\u4ef6\u4e8b\u60c5:<\/p>\n<blockquote><p>\u5047\u8bbe\u6211\u4eec\u624d\u6837N\u4e2a\u70b9\uff0c\u671f\u671b\u7684\u7cbe\u5ea6\u662fk\u4f4d\uff0c\u90a3\u4e48\u6211\u4eec\u6bcf\u4e2a\u70b9\u7684\u7cbe\u5ea6\u5e94\u8be5\u662f\u591a\u5c11\u3002<\/p><\/blockquote>\n<p>\u6839\u636e\u5e73\u5747\u6027\u80fd\u8003\u8651\uff0c\u5728\u53d6\u6837\u70b9$$k-Log_{10}N$$\u540e\u9762\u7684\u4f4d\u6570\u5b9e\u9645\u4e0a\u90fd\u662f\u6beb\u65e0\u610f\u4e49\u7684\uff0c\u4e5f\u5c31\u662f\u8bf4\uff0c\u6211\u4eec\u6837\u70b9\u7cbe\u5ea6\u8981\u6c42\u5728$$k-Log_{10}N$$\u540e\u9762\u7684\u90fd\u4f1a\u4e3a\u6d9b\u6d9b\u6570\u636e\u6240\u6df9\u6ca1\u6389\uff0c\u5bf9\u4e8edouble\u800c\u8a00\uff0c\u7cbe\u5ea6\u572815-16\u4f4d\u5de6\u53f3\uff0c\u5bf9\u4e8efloat\uff0c\u7cbe\u5ea6\u57286-7\u4e3a\u5de6\u53f3\uff0c\u800c\u7a9d\u4eec\u7684cuda\u4ee3\u7801\u6253\u7b97\u7528$$2^{30}$$\u6765\u7b97\u79ef\u5206\uff08\u6740\u9e21\u7109\u7528\u725b\u5200\uff1f\u4e0d\u8fc7\u6211\u5c31\u7528\u4e86\uff09\uff0c\u90a3\u4e48\u91c7\u6837\u65f6\u4f7f\u7528float\u662f\u79d1\u5b66\u3002\u4e0d\u8fc7\u4e3a\u4e86\u7701\u4e8b\u60c5\u3002\u3002\u3002\u5c31\u5e72\u8106\u5168\u90e8\u7528double\u4e86<\/p>\n<p>\u8fd9\u662f\u6211\u4eec\u8981\u9996\u5148\u5904\u7406\u7684\u51fd\u6570\uff0c$$\\sqrt{x+\\sqrt{x}}$$\uff0c\u5199\u5728c\u91cc\u9762\u5f88\u7b80\u5355\uff1a\/\/\u4e0d\u8fc7\u540e\u9762\u4f1a\u6709\u4e9b\u95ee\u9898<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0func0<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0x<\/span><span class=\"pun\">)<\/span><\/li>\n<li class=\"L1\"><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">return<\/span><span class=\"pln\">\u00a0sqrt<\/span><span class=\"pun\">(<\/span><span class=\"pln\">x<\/span><span class=\"pun\">+<\/span><span class=\"pln\">sqrt<\/span><span class=\"pun\">(<\/span><span class=\"pln\">x<\/span><span class=\"pun\">));<\/span><\/li>\n<li class=\"L3\"><span class=\"pun\">}<\/span><\/li>\n<\/ol>\n<p>\u65f6\u95f4\u6709\u9650\uff0c\u5e9f\u8bdd\u5c11\u8bf4\uff0cCUDA\u4f20\u5165\u51fd\u6570\u6307\u9488\u4f1a\u5f88\u8f7b\u6613\u9020\u6210\u7cfb\u7edf\u5d29\u6e83\uff08CPU\u8fd8\u5728\u8fd0\u8f6c\uff09\uff0c\u5177\u4f53\u539f\u56e0\u4f30\u8ba1\u662f\u5165\u53e3\u53ea\u6709\u4e00\u4e2a\u4e4b\u7c7b\u7684\uff0c\u9274\u4e8e\u6211\u5bf9\u51fd\u6570\u6307\u9488\u4e86\u89e3\u5e76\u4e0d\u4e30\u539a\uff0c\u4e0d\u5728\u8fd9\u91cc\u5984\u8c08\u4e86\u3002\u7559\u4e2a\u5751\u4ee5\u540e\u8865\u4e0a\u3002<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"pln\">__global__\u00a0<\/span><span class=\"kwd\">void<\/span><span class=\"pln\">\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">e_cell<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">*<\/span><span class=\"pln\">\u00a0l<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">*<\/span><span class=\"pln\">\u00a0r<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">long<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">time<\/span><span class=\"pun\">,<\/span><span class=\"pln\">curandState\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">state<\/span><span class=\"pun\">)<\/span><\/li>\n<li class=\"L1\"><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"com\">\/\/\u4e0d\u8981\u4f20\u51fd\u6570\u6307\u9488<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0i\u00a0<\/span><span class=\"pun\">=<\/span><span class=\"pln\">\u00a0blockIdx<\/span><span class=\"pun\">.<\/span><span class=\"pln\">x<\/span><span class=\"pun\">*<\/span><span class=\"typ\">BlockN<\/span><span class=\"pun\">+<\/span><span class=\"pln\">threadIdx<\/span><span class=\"pun\">.<\/span><span class=\"pln\">x<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">long<\/span><span class=\"pln\">\u00a0seed<\/span><span class=\"pun\">=(*<\/span><span class=\"pln\">time<\/span><span class=\"pun\">)+(<\/span><span class=\"pln\">i<\/span><span class=\"pun\">);<\/span><span class=\"com\">\/\/\u56e0\u4e3a\u6240\u6709\u7ed9\u5b9a\u65f6\u95f4\u4e00\u5b9a\uff0c\u6240\u4ee5\u6211\u4eec\u53ea\u80fd\u901a\u8fc7\u5bf9\u65f6\u95f4\u8fdb\u884c\u7b80\u5355\u5904\u7406<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0offset<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><span class=\"com\">\/\/\u5b8c\u5168\u72ec\u7acb\u7684\u5e8f\u5217\uff0c\u6240\u4ee5offset\u5168\u90e8\u4e3a\u96f6\u6765\u8282\u7ea6\u65f6\u95f4<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0curand_init\u00a0<\/span><span class=\"pun\">(<\/span><span class=\"pln\">seed<\/span><span class=\"pun\">,<\/span><span class=\"pln\">i<\/span><span class=\"pun\">,<\/span><span class=\"pln\">offset<\/span><span class=\"pun\">,&gt;<\/span><span class=\"pln\">state<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]);<\/span><span class=\"com\">\/\/\u8bbe\u7f6e\u7b2ci\u4e2a\u968f\u673a\u5e8f\u5217<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0x<\/span><span class=\"pun\">=<\/span><span class=\"lit\">1<\/span><span class=\"pun\">,<\/span><span class=\"pln\">sum<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0double k=1;<\/span><\/li>\n<li class=\"L9\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">for<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0j<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><span class=\"pln\">j<\/span><span class=\"pun\">&lt;k*<\/span><span class=\"typ\">Dev_Loop<\/span><span class=\"pun\">;<\/span><span class=\"pln\">j<\/span><span class=\"pun\">++)<\/span><\/li>\n<li class=\"L0\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"pun\">{<\/span><\/li>\n<li class=\"L1\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0x<\/span><span class=\"pun\">=(<\/span><span class=\"pln\">r<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]-<\/span><span class=\"pln\">l<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">])*<\/span><span class=\"pln\">curand_uniform_double<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">state<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]);<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0sum<\/span><span class=\"pun\">+=<\/span><span class=\"pln\">sqrt<\/span><span class=\"pun\">(<\/span><span class=\"pln\">x<\/span><span class=\"pun\">+<\/span><span class=\"pln\">sqrt<\/span><span class=\"pun\">(<\/span><span class=\"pln\">x<\/span><span class=\"pun\">));<\/span><span class=\"com\">\/\/func(x);<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"pun\">}<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0res<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]=<\/span><span class=\"pln\">sum<\/span><span class=\"com\">\/Dev_Loop\/k;<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0__syncthreads<\/span><span class=\"pun\">();<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L7\"><span class=\"pun\">}<\/span><\/li>\n<\/ol>\n<p>\u6069\uff0c\u5f88\u7b80\u5355\u561b\uff01\u5c31\u662f\u4e00\u4e2a\u51fd\u6570\uff0c\u7136\u540ecrazy\u91c7\u6837\u52a0\u548c\u5373\u53ef\u3002<\/p>\n<p>\u5173\u4e8e\u968f\u673a\u6570\u90e8\u5206\uff0c\u8bf7\u53c2\u89c1<a title=\"CUDA\uff1a\u8499\u7279\u5361\u6d1b\u4ee5\u53ca\u968f\u673a\u6570\u652f\u6301\" href=\"http:\/\/blog.stlover.org\/?p=188\">\u8499\u7279\u5361\u6d1b<\/a>\u3002<\/p>\n<p>\u6709\u51e0\u4e2a\u5c0f\u95ee\u9898\uff0c\u57fa\u4e8e\u5b89\u5168\u6027\u8003\u8651\uff0c\u6211\u4eec\u7684l,r\u5e76\u6ca1\u6709\u516c\u7528\u540c\u4e00\u7a7a\u95f4\u3002\u540c\u65f6\u4e5f\u662f\u4e3a\u4e0b\u4e00\u79cd\u590d\u6742\u91c7\u6837\u7684\u8499\u7279\u5361\u6d1b\u505a\u51c6\u5907\u3002\u6bcf\u4e2a\u533a\u95f4\u53d6\u503c\u82e5\u5e72\u6b21\u6c42\u548c\u5e73\u5747\u3002<\/p>\n<p>\u5728\u521d\u6b65\u7684\u6d4b\u8bd5\u4e2d\u6211\u624d\u7528\u4e86256*1024\u4e2a\u8ba1\u7b97\u7ec6\u80de\uff0c\u8fd9\u5e76\u4e0d\u7b97\u662f\u4e00\u4e2a\u5927\u7684\u6570\u636e\u91cf\uff0c\u4e0d\u8fc7\u57fa\u4e8e\u7cbe\u7ec6\u7684\u4e3a\u4ee5\u540e\u53d1\u5c55\u7684\u8003\u91cf\uff0c\u7814\u7a76\u4e00\u79cd\u53ef\u7528\u4e8eG\u7ea7\u522b\u6570\u636e\u7684\u52a0\u901f\u52a0\u6cd5\u8fd8\u662f\u6709\u7528\u7684\uff0c\u4e8e\u662f\u6211\u5b9a\u4e49\u4e86\u8fd9\u6837\u4e00\u4e2a\u73a9\u610f<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"pln\">__global__\u00a0<\/span><span class=\"kwd\">void<\/span><span class=\"pln\">\u00a0big_plus<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">*<\/span><span class=\"pln\">a<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">threadNum<\/span><span class=\"pun\">)<\/span><\/li>\n<li class=\"L1\"><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"com\">\/\/\u4e3a\u4e86\u5c3d\u53ef\u80fd\u7684\u5229\u7528\u5e76\u884c\u6548\u7387\uff0c\u52a0\u6cd5\u91c7\u7528\u4e24\u6b21\u6811\u5f62\u76f8\u52a0\u7684\u5f62\u5f0f\uff0c\u6bcf\u6b21\u52a0addNum\u4e2a<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"com\">\/\/\u5982\u6b64\u53ef\u4ee5\u5bf9\u4ed82^30\u6b21\u7684\u5feb\u901f\u76f8\u52a0<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"com\">\/\/\u867d\u7136\u561b\u3002\u3002\u3002\u3002\u8fd9\u662f\u6beb\u65e0\u610f\u4e49\u7684\u5566\uff01\u56e0\u4e3a\u672c\u7a0b\u5e8f\u53ea\u67092^20\u6b21\u7684\u76f8\u52a0<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"com\">\/\/\u4e0d\u8fc7\u7559\u4e2a\u63a5\u53e3\u4ee5\u540e\u7528\u603b\u662f\u597d\u4e8b\u60c5<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0i<\/span><span class=\"pun\">=<\/span><span class=\"pln\">blockIdx<\/span><span class=\"pun\">.<\/span><span class=\"pln\">x<\/span><span class=\"pun\">*(*<\/span><span class=\"pln\">threadNum<\/span><span class=\"pun\">)+<\/span><span class=\"pln\">threadIdx<\/span><span class=\"pun\">.<\/span><span class=\"pln\">x<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0sum<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L9\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0k<\/span><span class=\"pun\">=<\/span><span class=\"pln\">i<\/span><span class=\"pun\">*<\/span><span class=\"pln\">addNum<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L0\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">for<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0j<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><span class=\"pln\">j<\/span><span class=\"pun\">&lt;<\/span><span class=\"pln\">addNum<\/span><span class=\"pun\">;<\/span><span class=\"pln\">j<\/span><span class=\"pun\">++)<\/span><\/li>\n<li class=\"L1\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0sum<\/span><span class=\"pun\">+=<\/span><span class=\"pln\">a<\/span><span class=\"pun\">[<\/span><span class=\"pln\">k<\/span><span class=\"pun\">];<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0k<\/span><span class=\"pun\">++;<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"pun\">}<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0res<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]=<\/span><span class=\"pln\">sum<\/span><span class=\"pun\">\/<\/span><span class=\"pln\">addNum<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0__syncthreads<\/span><span class=\"pun\">();<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><\/li>\n<li class=\"L8\"><span class=\"pun\">}<\/span><\/li>\n<\/ol>\n<p>\u7b80\u5355\u7684\u8bf4\u5c31\u662f\u628a\u5927\u6570\u636e\u7684\u76f8\u52a0\u4e5f\u8f6c\u5316\u4e3a\u4e00\u4e2a\u9ad8\u5ea6\u5e76\u884c\u7684\u76f8\u52a0-\u5408\u5e76-\u76f8\u52a0\u8fc7\u7a0b\u3002<\/p>\n<p>\u5b66\u8fc7\u52a8\u5f52\u7684\u5b69\u5b50\u5e94\u8be5\u505a\u8fc7\u4e00\u9053\u9898\uff1a\u5408\u5e76\u50bb\u5b50\uff0c\u5176\u5b9e\u8bf4\u6765\u6211\u5f53\u65f6\u505a\u8fd9\u4e2a\u5c0f\u51fd\u6570\u7684\u65f6\u5019\u7b2c\u4e00\u4e2a\u60f3\u5230\u7684\u662f\u8fd9\u4e2a&#8230;.\u4e0d\u8fc7\u6211\u4eec\u6682\u65f6\u5148\u628a\u8fd9\u4e2a\u653e\u4e00\u8fb9\uff0c\u56e0\u4e3a\u8fd9\u4e2a\u51fd\u6570\u7684\u4f7f\u7528\u9700\u8981\u5e94\u7528\u5230CUDA\u4e2d\u5f88\u91cd\u8981\u7684\u4e00\u4e2a\u6982\u5ff5\uff1a\u6d41\u7ba1\u7406\u3002\u6211\u4eec\u73b0\u5728\u5148\u6682\u65f6\u4e0d\u6d89\u53ca\u8fd9\u4e00\u8bdd\u9898\u3002<\/p>\n<p>\u66f4\u8fdb\u4e00\u6b65\u7684\uff0c\u6211\u4eec\u9700\u8981\u51e0\u4e2a\u5c0f\u51fd\u6570\u6765\u5904\u7406\u4e00\u4e9b\u7ec6\u8282\u95ee\u9898\uff0c\u9996\u5148\u662f\u5173\u4e8e\u521d\u503c\u7ed9\u5b9a\u7684\u95ee\u9898\uff0c\u8fd9\u5176\u5b9e\u662f\u6ee1\u7e41\u7410\u7684\u4e8b\u60c5\uff0c\u65e0\u975e\u662f\u7ed9\u663e\u5b58\u5185\u7684\u4e00\u4e2a\u6570\u7ec4\u8d4b\u503c\u4e3a\u67d0\u4e00\u503c&#8230;..\u5c31\u5e72\u8106\u5199\u4fe9\u5c0f\u51fd\u6570\u89e3\u51b3\u3002\u3002\u3002\u4e3a\u4e86\u65b9\u4fbf\u8bb0\u5fc6\u5c31\u4e0d\u91cd\u8f7d\u4e86<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"typ\">DevValueD<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0v<\/span><span class=\"pun\">,<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0len<\/span><span class=\"pun\">)<\/span><span class=\"com\">\/\/\u628ahost\u503c\u8f6c\u5316\u4e3adev\u6307\u9488\u503c<\/span><\/li>\n<li class=\"L1\"><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">*<\/span><span class=\"pln\">res<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">)*<\/span><span class=\"pln\">len<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">val<\/span><span class=\"pun\">=<\/span><span class=\"kwd\">new<\/span><span class=\"pln\">\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">[<\/span><span class=\"pln\">len<\/span><span class=\"pun\">];<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">for<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0i<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><span class=\"pln\">i<\/span><span class=\"pun\">&lt;<\/span><span class=\"pln\">len<\/span><span class=\"pun\">;<\/span><span class=\"pln\">i<\/span><span class=\"pun\">++)<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0val<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]=<\/span><span class=\"pln\">v<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMemcpy<\/span><span class=\"pun\">(<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"pln\">val<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">)*<\/span><span class=\"pln\">len<\/span><span class=\"pun\">,<\/span><span class=\"pln\">cudaMemcpyHostToDevice<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">return<\/span><span class=\"pln\">\u00a0res<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L9\"><span class=\"pun\">}<\/span><\/li>\n<li class=\"L0\"><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"typ\">DevValueI<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0v<\/span><span class=\"pun\">,<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0len<\/span><span class=\"pun\">)<\/span><\/li>\n<li class=\"L1\"><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pun\">*<\/span><span class=\"pln\">res<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pun\">)*<\/span><span class=\"pln\">len<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">val<\/span><span class=\"pun\">=<\/span><span class=\"kwd\">new<\/span><span class=\"pln\">\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pun\">[<\/span><span class=\"pln\">len<\/span><span class=\"pun\">];<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">for<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0i<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><span class=\"pln\">i<\/span><span class=\"pun\">&lt;<\/span><span class=\"pln\">len<\/span><span class=\"pun\">;<\/span><span class=\"pln\">i<\/span><span class=\"pun\">++)<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0val<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">]=<\/span><span class=\"pln\">v<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMemcpy<\/span><span class=\"pun\">(<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"pln\">val<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pun\">)*<\/span><span class=\"pln\">len<\/span><span class=\"pun\">,<\/span><span class=\"pln\">cudaMemcpyHostToDevice<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">return<\/span><span class=\"pln\">\u00a0res<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L9\"><span class=\"pun\">}<\/span><\/li>\n<\/ol>\n<p>\u4e8e\u662f\u4e00\u4e2a\u57fa\u4e8e<a title=\"CUDA\uff1a\u8499\u7279\u5361\u6d1b\u4ee5\u53ca\u968f\u673a\u6570\u652f\u6301\" href=\"http:\/\/blog.stlover.org\/?p=188\">\u8499\u7279\u5361\u6d1b<\/a>\u7684\u79ef\u5206\u7a0b\u5e8f\u5c31\u5f88\u5feb\u642d\u5efa\u8d77\u6765\u4e86\uff08\u867d\u7136\u8db3\u8db3\u6298\u817e\u4e86\u4e24\u5929\uff0c\u56e0\u4e3a\u5404\u79cd\u5e76\u53d1\u6027\u7684\u4e86\u89e3\u4e0d\u8db3\uff0c\u5178\u578b\u4ee3\u8868\u662f\u56e0\u4e3abig_plus\u51fd\u6570\u9020\u6210\u5927\u91cf\u96f6\u7684\u7ed3\u679c\uff0c\u6216\u8005\u662f\u51fd\u6570\u6307\u9488\u4f7f\u5f97\u7a0b\u5e8f\u76f4\u63a5\u5d29\u6e83\u3002\uff09<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0work<\/span><span class=\"pun\">()<\/span><\/li>\n<li class=\"L1\"><span class=\"pun\">{<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0threadPerBlock<\/span><span class=\"pun\">=<\/span><span class=\"typ\">BlockN<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0numBlocks<\/span><span class=\"pun\">=<\/span><span class=\"pln\">\u00a0<\/span><span class=\"lit\">256<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">size_t<\/span><span class=\"pln\">\u00a0size\u00a0<\/span><span class=\"pun\">=<\/span><span class=\"pln\">\u00a0<\/span><span class=\"typ\">BlockN<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">*<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">long<\/span><span class=\"pln\">\u00a0st<\/span><span class=\"pun\">=<\/span><span class=\"pln\">getCurrentTime<\/span><span class=\"pun\">();<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0curandState\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">state<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L9\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">state<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"pln\">curandState<\/span><span class=\"pun\">)*<\/span><span class=\"lit\">1024<\/span><span class=\"pun\">*<\/span><span class=\"lit\">1024<\/span><span class=\"pun\">);<\/span><span class=\"com\">\/\/\u8bbe\u7acb\u968f\u673a\u72b6\u6001\u5217<\/span><\/li>\n<li class=\"L0\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L1\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">*<\/span><span class=\"pln\">\u00a0d_A<\/span><span class=\"pun\">,*<\/span><span class=\"pln\">add_tem0<\/span><span class=\"pun\">,*<\/span><span class=\"pln\">add_tem1<\/span><span class=\"pun\">,*<\/span><span class=\"pln\">res<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">d_A<\/span><span class=\"pun\">,<\/span><span class=\"pln\">\u00a0size<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">add_tem0<\/span><span class=\"pun\">,<\/span><span class=\"pln\">\u00a0size<\/span><span class=\"pun\">\/<\/span><span class=\"lit\">16<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">add_tem1<\/span><span class=\"pun\">,<\/span><span class=\"pln\">\u00a0size<\/span><span class=\"pun\">\/<\/span><span class=\"lit\">256<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMalloc<\/span><span class=\"pun\">(&gt;<\/span><span class=\"pln\">res<\/span><span class=\"pun\">,<\/span><span class=\"kwd\">sizeof<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">));<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"typ\">int<\/span><span class=\"pln\">e_cell<\/span><span class=\"pun\">&lt;&lt;&lt;<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">,<\/span><span class=\"pln\">threadPerBlock<\/span><span class=\"pun\">&gt;&gt;&gt;(<\/span><span class=\"typ\">DevValueD<\/span><span class=\"pun\">(<\/span><span class=\"lit\">0.0<\/span><span class=\"pun\">,<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">*<\/span><span class=\"pln\">threadPerBlock<\/span><span class=\"pun\">),<\/span><span class=\"typ\">DevValueD<\/span><span class=\"pun\">(<\/span><span class=\"lit\">1.0<\/span><span class=\"pun\">,<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">*<\/span><span class=\"pln\">threadPerBlock<\/span><span class=\"pun\">),<\/span><span class=\"pln\">d_A<\/span><span class=\"pun\">,<\/span><span class=\"pln\">getCurrentTimeForDev<\/span><span class=\"pun\">(),<\/span><span class=\"pln\">state<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L9\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0<\/span><span class=\"pun\">*<\/span><span class=\"pln\">result<\/span><span class=\"pun\">=<\/span><span class=\"kwd\">new<\/span><span class=\"pln\">\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pun\">[<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">*<\/span><span class=\"pln\">threadPerBlock<\/span><span class=\"pun\">];<\/span><\/li>\n<li class=\"L0\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><\/li>\n<li class=\"L1\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">double<\/span><span class=\"pln\">\u00a0fin_res<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaMemcpy<\/span><span class=\"pun\">(<\/span><span class=\"pln\">result<\/span><span class=\"pun\">,<\/span><span class=\"pln\">d_A<\/span><span class=\"pun\">,<\/span><span class=\"pln\">\u00a0size<\/span><span class=\"pun\">,<\/span><span class=\"pln\">\u00a0cudaMemcpyDeviceToHost<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">for<\/span><span class=\"pun\">(<\/span><span class=\"typ\">int<\/span><span class=\"pln\">\u00a0i<\/span><span class=\"pun\">=<\/span><span class=\"lit\">0<\/span><span class=\"pun\">;<\/span><span class=\"pln\">i<\/span><span class=\"pun\">&lt;<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">*<\/span><span class=\"pln\">threadPerBlock<\/span><span class=\"pun\">;<\/span><span class=\"pln\">i<\/span><span class=\"pun\">++)<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"pun\">{<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0fin_res<\/span><span class=\"pun\">+=<\/span><span class=\"pln\">result<\/span><span class=\"pun\">[<\/span><span class=\"pln\">i<\/span><span class=\"pun\">];<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"pun\">}<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0fin_res<\/span><span class=\"pun\">\/=(<\/span><span class=\"pln\">numBlocks<\/span><span class=\"pun\">*<\/span><span class=\"pln\">threadPerBlock<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"kwd\">long<\/span><span class=\"pln\">\u00a0ed<\/span><span class=\"pun\">=<\/span><span class=\"pln\">getCurrentTime<\/span><span class=\"pun\">();<\/span><\/li>\n<li class=\"L9\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L0\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0printf<\/span><span class=\"pun\">(<\/span><span class=\"str\">&#8220;GPU\u00a0running\u00a0Time:%ld\\n&#8221;<\/span><span class=\"pun\">,<\/span><span class=\"pln\">ed<\/span><span class=\"pun\">&#8211;<\/span><span class=\"pln\">st<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L1\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0printf<\/span><span class=\"pun\">(<\/span><span class=\"str\">&#8220;final:%16.14f\\n&#8221;<\/span><span class=\"pun\">,<\/span><span class=\"pln\">fin_res<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">\u00a0<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaFree<\/span><span class=\"pun\">(<\/span><span class=\"pln\">d_A<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L4\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaFree<\/span><span class=\"pun\">(<\/span><span class=\"pln\">d_A<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L5\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaFree<\/span><span class=\"pun\">(<\/span><span class=\"pln\">add_tem0<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L6\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaFree<\/span><span class=\"pun\">(<\/span><span class=\"pln\">add_tem1<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L7\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaFree<\/span><span class=\"pun\">(<\/span><span class=\"pln\">state<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L8\"><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0cudaFree<\/span><span class=\"pun\">(<\/span><span class=\"pln\">res<\/span><span class=\"pun\">);<\/span><\/li>\n<li class=\"L9\"><span class=\"pun\">}<\/span><\/li>\n<li class=\"L0\"><span class=\"pln\">\u00a0<\/span><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"typ\"><span style=\"color: #141412; line-height: 1.5;\">\u66f4\u52a0\u8be6\u7ec6\u7684\u4ee3\u7801\u53c2\u89c1\u6211\u7684<\/span><a style=\"line-height: 1.5;\" title=\"GitHub\" href=\"https:\/\/github.com\/xuhao1\/CUDA_Learning\">GitHub<\/a><span style=\"color: #141412; line-height: 1.5;\">,\u91cc\u9762\u5305\u542b\u5bf9\u4e8eCPU\u6d4b\u8bd5\u6548\u7387\u7684\u4ee3\u7801\u3002<\/span><\/span><\/li>\n<\/ol>\n<p>\u8fd8\u6709\u5c31\u662f\u5173\u4e8eCUDA\u7684\u4e00\u4e9b\u5b66\u4e60\u5fc3\u5f97.<\/p>\n<p>\u9996\u5148\uff0c\u4efb\u4f55\u6ca1\u6709__device__,__global__\u7684\u51fd\u6570\u90fd\u662f\u5728\u8bbe\u5907\u4ee3\u7801\u4e2d\u65e0\u6cd5\u8c03\u7528\u7684\uff0c\u800c\u4e14\uff0c\u66f4\u8fdb\u4e00\u6b65\u7684\u662f\u4efb\u4f55__global__\u7684\u51fd\u6570\u5fc5\u987b\u67093.5\u7684\u8ba1\u7b97\u7b49\u7ea7\u624d\u53ef\u4ee5\u5728\u8bbe\u5907\u4ee3\u7801\u4e2d\u8c03\u7528\u3002<\/p>\n<p>\u5176\u6b21\uff0c\u51fd\u6570\u6307\u9488\uff0c\u614e\u7528\u3002\u5173\u4e8eCUDA\u521d\u6b65\u7684\u4f53\u4f1a\u662f\u4e00\u5207\u8981\u975e\u5e38\u7684\u5e72\u51c0\u3002<\/p>\n<p>\u5b66\u8fc7Haskell\u7684\u53ef\u80fd\u53ef\u4ee5\u4f53\u4f1a\u7684\u5230\u90a3\u79cd\u4e25\u683c\u5230\u4ee4\u4eba\u7a92\u606f\u7684\u5e72\u51c0\u7684\u7f16\u7a0b\uff08\u6240\u4ee5\u5f88\u591a\u4eba\u90fd\u975e\u5e38\u8ba8\u538cHaskell\u7684io\uff0c\u56e0\u4e3a\u5979\u592a\u8fdd\u548c\u4e86\uff09\uff0c\u5728CUDA\u8fd9\u79cd\u9ad8\u5ea6\u5e76\u884c\u5316\u7684\u7f16\u7a0b\u4e2d\uff0c\u6211\u4eec\u4e0d\u5f97\u4e0d\u628a\u6bcf\u4e2a\u7a0b\u5e8f\u5f53\u505avoid\u683c\u5f0f\u7684\u65e0\u526f\u4f5c\u7528\u7684\u7eaf\u51fd\u6570\uff0c\u5f88\u5947\u602a\u5427\u3002\u4f46\u662f\u6211\u4eec\u4e0d\u5f97\u4e0d\u628a\u6bcf\u4e2a\u8ba1\u7b97\u7ec6\u80de\u7684\u8f93\u5165\u5c40\u9650\u4e8e\u8f93\u5165\u7684\u6570\u7ec4\u7684\u4e00\u884c\uff0c\u540c\u6837\u8f93\u51fa\u7684\u4e00\u884c\u3002\u800c\u4e14\u907f\u514d\u80e1\u4e71\u66f4\u6539\u8f93\u5165\u6570\u636e\uff0c\u5728\u4e00\u5927\u5806\u7e41\u7410\u7684\u63a5\u8fd1\u786c\u4ef6\u7684\u4ee3\u7801\u4e2d\uff0c\u83b7\u5f97\u4e00\u4e2a\u6f02\u4eae\u7684\u7f16\u7a0b\u6a21\u578b\u5e76\u4e0d\u662f\u4e00\u4ef6\u5bb9\u6613\u7684\u4e8b\u60c5\uff0c\u6240\u4ee5\u6709\u65f6\u5019\u5f3a\u8feb\u75c7\u4e5f\u662f\u5f88\u91cd\u8981\u7684\u3002<\/p>\n<p>CUDA\u66f4\u50cf\u662f\u4e3a\u7269\u7406\u5b66\u5bb6\u548c\u5de5\u7a0b\u5e08\u8bbe\u8ba1\u7684\u3002\u5bf9\u4e8e\u6570\u5b66\u5bb6\u4ee5\u53ca\u5408\u683c\u7684\u7801\u519c\u800c\u8a00\uff0c\u4f7f\u7528\u9012\u5f52\u601d\u8003\u5c31\u50cf\u547c\u5438\u4e00\u6837\u81ea\u7136\uff0c\u4f46CUDA\u4e2d\u6211\u5e76\u6ca1\u6709\u627e\u5230\u4e00\u5957\u5b8c\u7f8e\u7684\uff0c\u81ea\u52a8\u5206\u914d\u5e76\u884c\u5b50\u7a0b\u5e8f\u7684\u5bf9\u4e8e\u9012\u5f52\u7684\u89e3\u51b3\u65b9\u6848\uff0c\u8fd9\u6216\u8bb8\u4f1a\u662f\u6211\u5bf9\u4e8eCUDA\u63a2\u7d22\u548c\u81ea\u5df1\u521b\u9020\u7684\u4e00\u4e2a\u91cd\u70b9\uff1a<\/p>\n<blockquote><p>\u5199\u4e00\u4e2a\u57fa\u4e8eCUDA\u7684\uff0c\u5b8c\u7f8e\u7684\u81ea\u5e76\u884c\u9012\u5f52\u6a21\u578b\u3002<\/p><\/blockquote>\n<p>\u5f53\u7136CUDA\u7684\u5f3a\u5927\u8ba9\u6211\u53c8\u56de\u60f3\u8d77\u4e86\u53e6\u5916\u4e00\u5957\u6211\u6784\u60f3\u5df2\u4e45\u7684\u73a9\u610f\uff0c<\/p>\n<blockquote><p>\u4e00\u79cd\u57fa\u4e8e\u7eaf\u7cb9\u65b9\u7a0b\uff0c\u81ea\u7531\u5ea6\u7ea6\u675f\u548c\u5f3a\u5927\u901a\u7528\u6982\u7387\u6c42\u89e3\u5185\u6838\u7684\u7f16\u7a0b\u8bed\u8a00\u3002<\/p><\/blockquote>\n<p>\u4e00\u5957\u66f4\u50cf\u662f\u7269\u7406\u800c\u975e\u6570\u5b66\u7684\u7f16\u7a0b\u8bed\u8a00\uff0c\u9760\u8499\u7279\u5361\u6d1b\uff0c\u6f14\u5316\u8ba1\u7b97\u7b49\u6846\u67b6\u4e3a\u5185\u6838\u3002\u5f53\u7136\u5f97\u7b49\u6211\u8865\u5b8c\u4f5c\u4e1a\u8ffd\u5230\u5973\u795e\u4ee5\u540e\u518d\u8336\u4f59\u996d\u540e\u5199\u54af\u3002<\/p>\n<h2>QUICK<\/h2>\n<p>\u9996\u5148\u5fc5\u987b\u660e\u4e86\u7684\u662fCUDA\u65e0\u8bba\u6a21\u578b\u591a\u4e48\u62bd\u8c61\uff0cGPU\u7684\u6838\u5fc3\u6570\u91cf\u603b\u662f\u6709\u9650\u7684\u3002\u542f\u52a8\u66f4\u662f\u9700\u8981\u65f6\u95f4\u7684\u3002\u6240\u4ee5\u6211\u4eec\u9700\u8981\u4e00\u4e9b\u6d4b\u8bd5\u5df2\u7ecf\u7ecf\u9a8c\u6765\u8fdb\u884c\u6211\u4eec\u7684\u7f16\u7a0b\u5de5\u4f5c\u3002<\/p>\n<p>\u4e0a\u9762\u5199\u597d\u7684\u8fd9\u4e2a\u7a0b\u5e8f\u6211\u4eec\u8fd0\u884c\u4e0b\uff0c<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"lit\">5<\/span><span class=\"pun\">&#8211;<\/span><span class=\"lit\">7<\/span><span class=\"pln\">$\u00a0<\/span><span class=\"pun\">.\/<\/span><span class=\"pln\">integration\u00a0<\/span><\/li>\n<li class=\"L1\"><span class=\"pln\">GPU\u00a0running\u00a0<\/span><span class=\"typ\">Time<\/span><span class=\"pun\">:<\/span><span class=\"lit\">2282524<\/span><\/li>\n<li class=\"L2\"><span class=\"pln\">final<\/span><span class=\"pun\">:<\/span><span class=\"lit\">66.86583087598933<\/span><\/li>\n<li class=\"L3\"><span class=\"pln\">cpu<\/span><span class=\"pun\">:<\/span><span class=\"pln\">time<\/span><span class=\"pun\">:<\/span><span class=\"lit\">6758037<\/span><span class=\"pun\">,<\/span><span class=\"pln\">res<\/span><span class=\"pun\">:<\/span><span class=\"pln\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"lit\">0.942827<\/span><\/li>\n<\/ol>\n<p>\u6069\uff0c\u901f\u5ea6\u8fd8\u4e0d\u9519\uff0c\u5f6aCPU\u4e09\u500d\u3002<\/p>\n<p>\u8fd9\u6837\uff0c\u4e3a\u4e86\u6d4b\u8bd5256*1024\u4e2a\u7ebf\u7a0b\u7684\u542f\u52a8\u7f13\u6162\u5230\u5e95\u5bf9\u7a0b\u5e8f\u5f71\u54cd\u6709\u591a\u5927\uff0c\u6211\u4eec\u5bf9GPU\u6838\u5fc3\u7684\u4e3b\u5faa\u73af\u4e58\u4e00\u4e2a\u7cfb\u6570\uff0c\u4ece0.5-16(2\u500d\u4e00\u4e2a\u9636\u6bb5)<\/p>\n<ol class=\"linenums\">\n<li class=\"L0\"><span class=\"lit\">0.5<\/span><span class=\"pln\">\u00a0<\/span><span class=\"lit\">1726350<\/span><\/li>\n<li class=\"L1\"><span class=\"lit\">1<\/span><span class=\"pln\">\u00a0<\/span><span class=\"lit\">2085732<\/span><\/li>\n<li class=\"L2\"><span class=\"lit\">2<\/span><span class=\"pln\">\u00a0<\/span><span class=\"lit\">2656851<\/span><\/li>\n<li class=\"L3\"><span class=\"lit\">4<\/span><span class=\"pln\">\u00a0<\/span><span class=\"lit\">3834121<\/span><\/li>\n<li class=\"L4\"><span class=\"lit\">8<\/span><span class=\"pln\">\u00a0<\/span><span class=\"lit\">6236095<\/span><\/li>\n<\/ol>\n<p>\u7ed8\u56fe\uff0c\u975e\u5e38\u5e72\u51c0\u7684\u7ebf\u6027\u5411\u6211\u4eec\u5c55\u793a\u4e86\u51fa\u6765<\/p>\n<p><a href=\"http:\/\/blog.stlover.org\/wp-content\/uploads\/2013\/11\/Screen-Shot-2013-11-14-at-2.29.39.png\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-202\" alt=\"Screen Shot 2013-11-14 at 2.29.39\" src=\"http:\/\/blog.stlover.org\/wp-content\/uploads\/2013\/11\/Screen-Shot-2013-11-14-at-2.29.39.png\" width=\"1864\" height=\"1110\" srcset=\"http:\/\/blog.xuhao1.me\/wp-content\/uploads\/2013\/11\/Screen-Shot-2013-11-14-at-2.29.39.png 1864w, http:\/\/blog.xuhao1.me\/wp-content\/uploads\/2013\/11\/Screen-Shot-2013-11-14-at-2.29.39-300x178.png 300w, http:\/\/blog.xuhao1.me\/wp-content\/uploads\/2013\/11\/Screen-Shot-2013-11-14-at-2.29.39-1024x609.png 1024w\" sizes=\"(max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"line-height: 1.5;\">\u6069\uff0c\u524d\u9762\u7684\u622a\u8ddd\u5c31\u662f\u542f\u52a8\u65f6\u95f4\u4e86\uff08\u6b64\u5904\u6d4b\u8bd5\u6211\u4eec\u5ffd\u7565\u4e86\u540e\u9762\u52a0\u548c\u7684\u65f6\u95f4\uff09\u53ef\u89c1\u5bf9\u4e8e2*1024\u6b21\u52a0\u5206\u7cfb\u7edf\u542f\u52a8\u7b49\u5de5\u4f5c\u662f\u975e\u5e38\u8017\u8d39\u65f6\u95f4\u7684\uff0c\u81f3\u4e8e6-8\u5219\u53ef\u4ee5\u5ffd\u7565\u4e86\u3002<\/span><\/p>\n<p><span style=\"line-height: 1.5;\">\u53e6\u5916\u662f\u5f53\u500d\u7387\u6bd4\u8f83\u9ad8\u7684\u65f6\u5019\uff0c\u7a0b\u5e8f\u4f1a\u4e0d\u7531\u81ea\u4e3b\u7684\u9677\u5165\u5d29\u6e83\uff08\u5728Macbook\u4e0a\u8fd9\u662f\u4e00\u4ef6\u5f88\u6050\u6016\u7684\u4e8b\u60c5\uff0c\u6211\u4e0d\u5f97\u4e0d\u53cd\u590d\u5f3a\u5236\u5173\u673a\uff09\uff0c\u6240\u4ee5\u614e\u7528\u3002\u00a0<\/span><\/p>\n<p>\u524d\u9762\u662f\u7b80\u5355\u7684\u57fa\u4e8e\u8499\u7279\u5361\u6d1b\u65b9\u6cd5\u7684\u76f4\u63a5\u79ef\u5206\uff0c\u540e\u9762\u8fd8\u4f1a\u6709\u4e9b\u57fa\u4e8e\u53d8\u6362\u7684\u3002\u7b49\u6211\u8003\u5b8c\u4e94\u4e2a\u5c0f\u65f6\u4ee5\u540e\u7684\u7406\u8bba\u529b\u5b66\u5427\u3002\u3002\u3002\u3002\u3002\u3002<\/p>\n<p>\u6240\u6709\u4ee3\u7801\u5df2\u66f4\u65b0\u5230<a href=\"https:\/\/github.com\/xuhao1\/CUDA_Learning\">GitHub<\/a>,\u8bf7\u81ea\u884c\u53c2\u9605\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u4f7f\u547d\u53ec\u55245\u4e2d\uff0c\u5f53\u4e3b\u89d2\u6740\u5165\u4e1b\u6797\uff0c\u5bc2\u9759\u7684\u53ea\u6709\u6708\u5149\uff0c\u961f\u957f\u95fb\u5230\u4e86\u4e00\u4e1d\u4e0d\u5bf9\u52b2\u7684\u6c14\u606f\uff0c\u4e8e\u662f\u544a\u8bc9\u5927\u5175\u4eec\uff0c make it q &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/blog.xuhao1.me\/?p=190\" class=\"more-link\">\u7ee7\u7eed\u9605\u8bfb<span class=\"screen-reader-text\">\u201cCUDA\u7b80\u5355\u79d1\u5b66\u8fd0\u7b97\u5c1d\u8bd5\uff1aMake It Quick and Clean!\u201d<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5,7],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=\/wp\/v2\/posts\/190"}],"collection":[{"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=190"}],"version-history":[{"count":8,"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=\/wp\/v2\/posts\/190\/revisions"}],"predecessor-version":[{"id":209,"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=\/wp\/v2\/posts\/190\/revisions\/209"}],"wp:attachment":[{"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=190"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=190"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.xuhao1.me\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=190"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}