Independent horizontal and vertical implementations.
Requires that blocks be built from 4x4 and [xy]_step_q4 == 16
6-10% improvement. CIF improved the least.
Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda
- The vp9 mbfilter C code will branch on flat and mask. This CL
will perform both branches and combine the data. A later CL will
perform a check to see if all patch will take one branch.
- These functions are about 1.75 times faster than the C code on
Nexus 7.
PS #3
- Changed all functions to dub limit, blimit, and thresh from
vld {dx[]}, freeing up r4-r6.
- Changed code to use vbif to reduce one instruction and free
up a d register.
Change-Id: I028dae0e434dc9891c3677bdb182e201ffb04777
- Added vp9_loop_filter_horizontal_edge_neon and
vp9_loop_filter_vertical_edge_neon.
- The functions are based off the vp8 loopfilter
functions.
- Matches x86 md5 checksum.
Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0