fixed several problems with CUDA 5.0 * gpu::LUT, uses device memory instead of host memory * gpu::multiply, round mod for CV_8U depth ........
moved TargerArchs and DeviceInfo to core fixed bug in GpuMat::copy with mask (incorrect index in function tab)