vpx/vp9/encoder
A.Mahfoodh 5215b83aea Simplifying and inlining k_cvtlo_epi16 and k_cvthi_epi16
Simplify the k_cvtlo_epi16 and k_cvthi_epi16 to only two
instructions. Then inlined them.

quoting from intel MMX_App_Compute_16bit_Vector.pdf‎
"The PMADDWD instruction multiplies four
pairs of 16-bit numbers and produces partial sums of the results
and can do so once per clock (with a three-clock latency)."
so I am assuming that there will be three clock overhead after the
last _mm_madd_pi16 command.
Even with the overhead the number of clocks in general should be
smaller. I am not sure though becasue I could not find information
about number of clocks required for instructions in k_cvtlo_epi16
and k_cvthi_epi16. I will run a test and compare the execution time.

Change-Id: Ieda4aa338f69ad3dd196ac6e7892da3cf1b47ea7
2013-10-02 20:02:03 -04:00
..
2013-06-07 16:00:26 -07:00
2013-09-24 10:53:01 -07:00
2013-02-21 13:50:15 -08:00
2013-02-28 13:18:02 -08:00
2013-09-25 16:44:19 -07:00
2013-02-21 10:34:33 -08:00
2013-02-21 10:34:33 -08:00
2013-10-01 15:09:32 -07:00
2013-04-19 22:16:28 +01:00