Compare commits

..

361 Commits

Author SHA1 Message Date
John Koleszar
e28e08146e Merge "Update CHANGELOG for Cayuga release" into cayuga 2011-08-04 10:30:15 -07:00
John Koleszar
f3538f2b81 Merge changes Ic7725e27,Ib3d54bfa into cayuga
* changes:
  Update AUTHORS
  Update .mailmap entry for Ralph Giles
2011-08-03 13:45:24 -07:00
John Koleszar
a49b9e0014 Merge changes I585167e1,Ia07602bd into cayuga
* changes:
  Fix building of static libs on universal-darwin
  Fix asm offsets generation for universal-darwin builds
2011-08-03 13:44:32 -07:00
John Koleszar
238dae8604 Fix source buffer selection
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.

Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
2011-08-03 16:13:15 -04:00
John Koleszar
06f58c0df7 Fix building of static libs on universal-darwin
The static libs should not be built from sources during the top level
of a universal build. This regression was introduced in commit
495b241fa6, which made the static
libs selectable under CONFIG_STATIC.

Change-Id: I585167e17459877e0fa7fa19e1046c3703d91c97
2011-08-03 10:38:45 -04:00
John Koleszar
c1bf6ca6cc Fix asm offsets generation for universal-darwin builds
Added BUILD_PFX to correct dependencies.

Change-Id: Ia07602bd98ef2253242b1bd66ef05e3b1e64ba7d
2011-08-03 10:38:33 -04:00
Johann
30e5deae5d update extend frame borders
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676

update the code with new assumptions and guard them with a compile time
assert

Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
2011-08-02 19:26:46 -04:00
John Koleszar
ea8d436f30 Update CHANGELOG for Cayuga release
Change-Id: If6f20553159105c05f9a684cb7c8f3778c7894a1
2011-08-02 14:43:05 -04:00
James Berry
27ee521753 include asm_com/dec_offsets for make dist
Change-Id: Ia1ad66066a24c01915cd9e3ff75c7e070cc984c8
2011-08-02 13:42:03 -04:00
John Koleszar
b956f2ceb2 Update AUTHORS
Change-Id: Ic7725e279d2263515e5312c152c58e1644eb2495
2011-08-02 10:09:59 -04:00
John Koleszar
e6847aa0f0 Update .mailmap entry for Ralph Giles
Change-Id: Ib3d54bfa81720a0b2877837d7149cd12d26e75e4
2011-08-02 10:09:36 -04:00
Lou Quillio
edfed938ba Sync vpxenc --timebase usage wording with docs change.
Change-Id: Ia406272a97806c0194435bb7f24e24d353ef5cc6
2011-08-02 09:57:50 -04:00
John Koleszar
f475f0c1bb Merge "include the arm header files in make dist" into cayuga 2011-08-02 05:21:10 -07:00
John Koleszar
81da41732c Merge "Fix building with --disable-postproc" into cayuga 2011-08-02 05:19:12 -07:00
John Koleszar
06c3d5bb9a Fix building with --disable-postproc
Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
2011-08-01 17:50:23 -04:00
Johann
3e8c6d3d35 include the arm header files in make dist
Change-Id: Ibcf5b4b14153f65ce1b53c3bfba87ad2feb17bbd
2011-08-01 17:20:21 -04:00
John Koleszar
b8791980b4 Merge "build error fix - obj_int_extract.bat" into cayuga 2011-08-01 13:56:32 -07:00
James Berry
61046b8d7a build error fix - obj_int_extract.bat
obj_int_extract.bat was not being copied
correctly for make dist. It now is.

Change-Id: I976479f90bbfa4798f241db1055e1e3b04ca2830
2011-08-01 16:55:06 -04:00
John Koleszar
7d984d8c38 Disable FORTIFY_SOURCE on glibc targets
Improve binary distributions by defeating longjmp interception. See
http://code.google.com/p/webm/issues/detail?id=166 for more information.

Change-Id: I5ac731ec3f3570088597201d0f411473e2dffa4f
2011-08-01 10:10:43 -04:00
John Koleszar
8ef25de377 install asm_offsets.h
Ensure vpx_ports/asm_offsets.h is installed with make dist

Change-Id: If9f32273fff975d60de1583b039dbbce8a7ccd27
2011-07-29 16:56:43 -04:00
John Koleszar
6f080f9cec Merge "Convert rc_max_intra_bitrate_pct to control" 2011-07-29 11:57:48 -07:00
John Koleszar
1f71d2e2c8 Correctly track sharpness in vp8cx_pick_filter_level_fast
Make sure to update last_sharpness_level from the current
sharpness_level whenever it changes.

Change-Id: I0258d2f5b11a407abf6176a8d4c4994d925943f0
2011-07-29 12:27:03 -04:00
John Koleszar
56b06aef6d Merge "configure: add --enable-static option" 2011-07-28 07:08:35 -07:00
John Koleszar
1654ae9a2a Convert rc_max_intra_bitrate_pct to control
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.

More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.

Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
2011-07-28 09:17:35 -04:00
Yunqing Wang
2f2302f8d5 Preload reference area in sub-pixel motion search (real-time mode)
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):

1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain

2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.

Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
2011-07-27 14:19:10 -04:00
Yunqing Wang
f11613b620 Merge "Fix range checks in motion search" 2011-07-27 09:34:13 -07:00
Yunqing Wang
bde2afbe23 Fix range checks in motion search
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.

Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
2011-07-27 10:37:33 -04:00
James Zern
3a975d9489 vpxenc: cosmetics: timebase help update / spelling
The timebase update fixes Issue #61.

Change-Id: I425158da7ea639464f61e6dd604ac9e6c72b7266
2011-07-26 17:27:01 -07:00
John Koleszar
db8f0d2ca9 Merge "cosmetics: consistently use [u]int64_t" 2011-07-26 12:57:43 -07:00
James Zern
b45065d38b cosmetics: consistently use [u]int64_t
Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.

Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a
2011-07-26 11:34:36 -07:00
Johann
ca7e346669 Merge ""Eliminated TOKENEXTRABITS" broke the windows build." 2011-07-26 06:34:31 -07:00
Scott LaVarnway
a11624497c "Eliminated TOKENEXTRABITS" broke the windows build.
Fixed.

Change-Id: I3348e8dbcaee6ace263af413701101d77636e5df
2011-07-26 09:33:16 -04:00
James Zern
495b241fa6 configure: add --enable-static option
Fixes issue #62.

Change-Id: I0567cf7897c0942666c19b3231c8c3b8e9c3e7cc
2011-07-25 15:40:36 -07:00
Scott LaVarnway
4894b45ced Merge "Eliminated TOKENEXTRABITS" 2011-07-25 14:35:58 -07:00
Scott LaVarnway
76eb402668 Eliminated TOKENEXTRABITS
Noticed small performance gains, depending on material.

Change-Id: I334369f6312bc19aa73481fc3f790ab181e11867
2011-07-25 17:11:24 -04:00
Yunqing Wang
5b0de48ddd Merge "Use CONFIG_FAST_UNALIGNED consistently in codec" 2011-07-25 12:40:50 -07:00
Yunqing Wang
fe270dd527 Specify size for argument pushed to stack
The change fixes building error on Win64.

Change-Id: I63d25b26220c4da8a98ca2e36530cbb802468e6b
2011-07-25 11:30:45 -04:00
Yunqing Wang
65dfcf4696 Use CONFIG_FAST_UNALIGNED consistently in codec
CONFIG_FAST_UNALIGNED is enabled by default. Disable it if it is
not supported by hardware.

Change-Id: I7d6905ed79fed918bca074bd62820b0c929d81ab
2011-07-25 10:11:24 -04:00
Johann
773bcc300d Merge "fix sharpness bug and clean up" 2011-07-22 09:34:55 -07:00
Johann
a04ed0e8f3 fix sharpness bug and clean up
sharpness was not recalculated in vp8cx_pick_filter_level_fast

remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.

always use cm->sharpness_level. the extra indirection was annoying.

don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init

move function declarations to their proper header

Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
2011-07-22 12:33:57 -04:00
Yunqing Wang
829179e888 Merge "Preload reference area to an intermediate buffer in sub-pixel motion search" 2011-07-22 06:56:15 -07:00
Yunqing Wang
20bd1446c0 Preload reference area to an intermediate buffer in sub-pixel motion search
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
2011-07-22 09:28:06 -04:00
Johann
52d13777da Merge "Add .size directive to ARM asm functions." 2011-07-21 12:56:59 -07:00
Johann
ddcdbfd71e Merge "Mark ARM asm objects as allowing a non-executable stack." 2011-07-21 12:20:00 -07:00
Timothy B. Terriberry
1647f00c29 Add .size directive to ARM asm functions.
This makes them show up properly in debugging tools like gdb and
 valgrind.

Change-Id: I0c72548a1090de88ba226314e5efe63360b7e07f
2011-07-21 11:46:14 -07:00
Timothy B. Terriberry
0453aca5af Mark ARM asm objects as allowing a non-executable stack.
This adds the magic .note.GNU-stack section at the end of each ARM
 asm file (when built with gas), indicating that a non-executable
 stack is allowed.
Without this section, the linker will assume the object requires an
 executable stack by default, forcing an executable stack for the
 entire program.

Change-Id: Ie86de6a449b52d392b9e5e0479833ed8c508ee65
2011-07-21 11:45:00 -07:00
John Koleszar
2bdda84e37 Merge "Increase chrow row alignment to 16 bytes." 2011-07-21 07:32:39 -07:00
Yunqing Wang
c5fe641179 Merge "Add improvements made in good-quality mode to real-time mode" 2011-07-21 07:27:09 -07:00
Timothy B. Terriberry
7d1b37cdac Increase chrow row alignment to 16 bytes.
This is done by expanding luma row to 32-byte alignment, since
 there is currently a bunch of code that assumes that
 uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
 common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
 encoder/temporal_filter.c, and possibly others; I haven't done a
 full audit).
It also uses replaces the hardcoded border of 16 in a number of
 encoder buffers with VP8BORDERINPIXELS (currently 32), as the
 chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
 dumping the frame memory as a contiguous blob produces a valid,
 if padded, image.

Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
2011-07-20 10:20:31 -07:00
Attila Nagy
0afcc76971 encoder: don't set the fragment bit for the last partition
Change-Id: Icb4e4f0d7c3074a8507852178be87541a1cb5bac
2011-07-20 14:09:42 +03:00
Scott LaVarnway
b2d9700f53 Merge "Moved vp8_encode_bool into boolhuff.h" 2011-07-19 08:15:14 -07:00
John Koleszar
d98a5ed4dd Revert "Disable __longjmp_chk protection"
This reverts commit b73a3693e5.

This version of the check doesn't work with generic-gnu, and figuring
out the correct symbol version at configure time is probably more work
than this is worth. May revisit in the future.

Change-Id: I6c75e88bd3bd82a4b21e09a25780fe53aacb7d70
2011-07-19 10:00:27 -04:00
Johann
6afafc313c remove old armv5 code
armv5 dequantizer is not referenced

Change-Id: Id1cc617dcee35ebd6a406816ec6aaa26e8bbc8ad
2011-07-19 09:20:38 -04:00
Scott LaVarnway
a25f6a9c88 Moved vp8_encode_bool into boolhuff.h
allowing the compiler to inline this function.  For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.

Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
2011-07-19 09:17:25 -04:00
John Koleszar
b5ea2fbc2c Improved 1-pass CBR rate control
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.

Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.

Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
2011-07-18 11:48:05 -04:00
John Koleszar
74ad25a4c6 Merge "Disable __longjmp_chk protection" 2011-07-18 08:43:59 -07:00
John Koleszar
da39e505dd Merge "Fixed rate histogram calculation" 2011-07-18 06:07:51 -07:00
Tero Rintaluoma
fd41cb8491 Fixed rate histogram calculation
Using small values for --buf-sz= in command line causes
floating point exception due to division by zero.

Change-Id: Ibfe2d44db922993a78ebc9a4a1087d9625de48ae
2011-07-18 10:35:05 +03:00
Scott LaVarnway
e68894fa03 Merge "Tokenize MB optimized" 2011-07-15 07:54:14 -07:00
Yunqing Wang
f676171e52 Merge "Fix vpxenc encoding incorrect webm file header on big endian machines(Issue 331)" 2011-07-15 05:21:35 -07:00
Tero Rintaluoma
4e82f01547 Tokenize MB optimized
Optimized C-code of the following functions:
 - vp8_tokenize_mb
 - tokenize1st_order_b
 - tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.

Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
2011-07-15 11:26:54 +03:00
James Berry
6b6f367c3d bug fix vpx_copy_and_extend_frame size issue
vpx_copy_and_extend_frame could incorrectly
resize uv frames which could result in a crash.

Change-Id: Ie96f7078b1e328b3907a06eebeee44ca39a2e898
2011-07-14 15:58:15 -04:00
John Koleszar
04dce631a2 Remove unused speed features
min_fs_radius, max_fs_radius, full_freq were set but never read.

Change-Id: I82657f4e7f2ba2acc3cbc3faa5ec0de5b9c6ec74
2011-07-14 14:20:25 -04:00
Fritz Koenig
4ab3175b12 Merge "Better allocate yuv buffers." 2011-07-13 14:18:11 -07:00
Yunqing Wang
f1f28535c3 Merge "Fix unnecessary casting of B_PREDICTION_MODE (issue 349)" 2011-07-13 13:32:57 -07:00
John Koleszar
b73a3693e5 Disable __longjmp_chk protection
glibc implements some checking on longjmp() calls by replacing it with
an internal function __longjmp_chk(), when FORTIFY_SOURCE is defined.
This can be problematic when compiling the library under one version of
glibc and running it under another. Work around this issue for the one
symbol affected for now, before taking out the undef hammer.

Fixes http://code.google.com/p/webm/issues/detail?id=166

Change-Id: Ifc5e25cdec17915e394711f2185b3e9214572d10
2011-07-13 16:07:00 -04:00
Yunqing Wang
139577f937 Fix unnecessary casting of B_PREDICTION_MODE (issue 349)
Minor fix.

Change-Id: Iaf93f6e47e882a33c479e57c7a0d0bf321e291c0
2011-07-13 15:52:07 -04:00
Yunqing Wang
0e9a6ed72a Add improvements made in good-quality mode to real-time mode
Several improvements we made in good-quality mode can be added
into real-time mode to speed up encoding in speed 1, 2, and 3
with small quality loss. Tests using tulip clip showed:

--rt --cpu-used=-1
(before change)
PSNR: 38.028
time: 1m33.195s
(after change)
PSNR: 38.014
time: 1m20.851s

--rt --cpu-used=-2
(before change)
PSNR: 37.773
time: 0m57.650s
(after change)
PSNR: 37.759
time: 0m54.594s

--rt --cpu-used=-3
(before change)
PSNR: 37.392
time: 0m42.865s
(after change)
PSNR: 37.375
time: 0m41.949s

Change-Id: I76ab2a38d72bc5efc91f6fe20d332c472f6510c9
2011-07-13 14:51:02 -04:00
Fritz Koenig
e9751d4b74 Better allocate yuv buffers.
Previously allocated more memory than necessary for yuv buffers.
This makes it harder to track bugs with reading uninitialized
data.

Change-Id: I510f7b298d3c647c869be6e5d51608becc63cce9
2011-07-13 10:37:15 -07:00
Fritz Koenig
84c3cd79d1 Merge "Reduce motion vector search on alt-ref frame." 2011-07-13 10:07:30 -07:00
John Koleszar
7f0b11c0ae Merge "Remove rotting NDS_NITRO code." 2011-07-13 05:46:30 -07:00
Johann
211694f67e Merge "update x86 asm for loopfilter" 2011-07-13 04:10:03 -07:00
Johann
8f910594bd Merge "Update armv6 loopfilter to new interface" 2011-07-13 04:09:55 -07:00
Johann
1a219c22b1 Merge "Update armv7 loopfilter to new interface" 2011-07-13 04:09:42 -07:00
Johann
d9b825cff2 Merge "New loop filter interface" 2011-07-13 04:09:26 -07:00
Fritz Koenig
d89eb6ad5a Remove rotting NDS_NITRO code.
Code has not been used and is no longer relevant.

Change-Id: I38590513da7c7a436804ff8a1a3805d9697f575d
2011-07-12 16:29:15 -07:00
Yunqing Wang
c156a68d06 Fix vpxenc encoding incorrect webm file header on big endian machines(Issue 331)
As reported in issue 331, vpxenc encoded incorrect webm file header
on big endian machines. This change fixed that.

Change-Id: I31924ebd476a87f3e88b9b5424540bf781d2b86f
2011-07-12 14:49:57 -04:00
Attila Nagy
c231b0175d Update armv6 loopfilter to new interface
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
2011-07-12 12:14:51 +03:00
Attila Nagy
283b0e25ac Update armv7 loopfilter to new interface
Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346
2011-07-12 12:12:25 +03:00
Fritz Koenig
ede0b15c9d Reduce motion vector search on alt-ref frame.
Clamp mv search to accomodate subpixel filtering
of UV mv.

Change-Id: Iab3ed405993ef6bf779ad7cf60863153068fb7d1
2011-07-11 09:05:43 -07:00
Yunqing Wang
587ca06da9 Minor change in pick_inter_mode()
Scott suggested to move vp8_mv_pred() under "case NEWMV" to save
extra checks.

Change-Id: I09e69892f34a08dd425a4d81cfcc83674e344a20
2011-07-08 14:08:45 -04:00
Yunqing Wang
e83d36c053 Merge "Adjust full-pixel clamping and motion vector limit calculation" 2011-07-08 08:39:32 -07:00
Yunqing Wang
40991faeae Adjust full-pixel clamping and motion vector limit calculation
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b748045470

Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
2011-07-08 11:34:28 -04:00
Johann
01433c5043 update x86 asm for loopfilter
Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
2011-07-08 09:23:38 -04:00
Johann
6ae12c415e Merge "clean up warnings when building arm with rtcd" 2011-07-08 05:16:09 -07:00
Attila Nagy
622958449b New loop filter interface
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.

Change works only with --target=generic-gnu
All other targets have to be updated!

Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
2011-07-08 09:31:41 +03:00
John Koleszar
973a9c075d Merge "Set VPX_FRAME_IS_DROPPABLE" 2011-07-07 08:11:05 -07:00
John Koleszar
37de0b8bdf Set VPX_FRAME_IS_DROPPABLE
Allow the encoder to inform the application that the encoded frame will not
be used as a reference.

Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
2011-07-07 10:38:45 -04:00
John Koleszar
b4f70084cc Merge "Properly use GET_GOT/RESTORE_GOT when using GLOBAL()." 2011-07-01 07:14:34 -07:00
Ronald S. Bultje
c8a23ad3f4 Properly use GET_GOT/RESTORE_GOT when using GLOBAL().
This should fix binaries using PIC on x86-32. Also should
fix issue 343.

Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02
2011-06-30 14:04:27 -07:00
Yunqing Wang
ae8aa836d5 Merge "Copy macroblock data to a buffer before encoding it" 2011-06-30 11:14:24 -07:00
Yunqing Wang
80c3bbf657 Merge "Bug fix in motion vector limit calculation" 2011-06-30 09:52:03 -07:00
Yunqing Wang
b748045470 Bug fix in motion vector limit calculation
Motion vector limits are calculated using right shifts, which
could give wrong results for negative numbers. James Berry's
test on one clip showed encoder produced some artifacts. This
change fixed that.

Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
2011-06-30 11:20:13 -04:00
Johann
3e4a80cc35 Merge "remove incorrect initialization" 2011-06-30 07:59:08 -07:00
John Koleszar
034cea5e72 Merge "guard against space/time distortion" 2011-06-29 11:36:51 -07:00
Johann
bb0ca87a0d guard against space/time distortion
and divide by 0 errors

Change-Id: I8af5ca3d0913cb6f278fff754f8772bcb62e674a
2011-06-29 14:34:25 -04:00
Paul Wilkins
eacaabc592 Merge "Change to arf boost calculation." 2011-06-29 10:03:57 -07:00
Paul Wilkins
11694aab66 Change to arf boost calculation.
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.

The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.

The new code can be deactivated using #define NEW_BOOST 0

In its current default state the code searches forwards and backwards
from the proposed  position of the next alt ref.

The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.

I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.

Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
2011-06-29 18:01:25 +01:00
Johann
fe53107fda remove incorrect initialization
Values were set, then reset. Only set them once.

Change-Id: Iaf43c8467129f2f261f04fa9188b603aa46216b5
2011-06-29 11:54:27 -04:00
Johann
6611f66978 clean up warnings when building arm with rtcd
Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df
2011-06-29 10:51:41 -04:00
John Koleszar
05239f0c41 vpxenc: prevent wraparound in the --rate-hist ringbuffer
For clips that are near 60fps and have a lot of alt refs, it's possible
that the ring buffer holding the previous frames sizes/pts could wrap
around, leading to a division by zero.

In addition to checking for this condition in the ring buffer loop,
the buffer size is made dependent on the actual frame rate in use,
rather than defaulting to 60, which should improve accuracy at frame
rates >= ~60.

Change-Id: If5a04d6e847316dc5f7504f25c01164cf9332be8
2011-06-29 10:30:19 -04:00
John Koleszar
f3a13cb236 Merge "Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently" 2011-06-29 07:29:59 -07:00
Johann
dc004e8c17 Merge "Avoid text relocations in ARM vp8 decoder" 2011-06-28 16:34:10 -07:00
Johann
02c30cdeef Merge "utilize preload in ARMv6 MC/LPF/Copy routines" 2011-06-28 16:33:45 -07:00
Johann
2d29457c4d Merge "respect alignment in arm asm files" 2011-06-28 16:32:48 -07:00
John Koleszar
b32da7c3da Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.

Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
2011-06-28 17:03:55 -04:00
John Koleszar
9bcf07ae4a Merge "Simplify decode_macroblock." 2011-06-28 12:54:25 -07:00
John Koleszar
14566125ea Merge "use relative include path" 2011-06-28 12:52:48 -07:00
James Zern
db6ee54353 vpxenc: free resources
Free buffers allocated for y4m input and webm cue list.

Change-Id: I02051baae3b45f692cf5c7f520ea9a2d80c7b470
2011-06-28 12:10:24 -07:00
Johann
4e4f835232 use relative include path
Files are already in vpx/

Change-Id: I67dcbb5d5b6cb55e91b4e4927ab842a1e2c9e284
2011-06-28 14:46:24 -04:00
Gaute Strokkenes
81c0546407 Simplify decode_macroblock.
Change-Id: Ieb2f3827ae7896ae594203b702b3e8fa8fb63d37
2011-06-28 17:01:14 +01:00
Stefan Holmer
7296b3f922 New ways of passing encoded data between encoder and decoder.
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.

At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.

At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.

Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.

The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.

Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
2011-06-28 11:10:17 -04:00
Stefan Holmer
b433e12a3d Proposing an extension to the encoder and decoder interfaces.
Adding capabilities with which the encoder can output frames
partition by partition, and the decoder can get input data
partition by partition.

Change-Id: Ieae0801480b8de8cd43c3c57dd3bab2e4c346fe0
2011-06-28 11:10:17 -04:00
Stefan Holmer
4cb0ebe5b2 Adding support for independent partitions
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.

Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
2011-06-28 11:10:17 -04:00
Mike Hommey
e3f850ee05 Avoid text relocations in ARM vp8 decoder
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.

Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.

Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
2011-06-28 09:11:40 +02:00
Fritz Koenig
be99868bd1 Fix after removal of B_MODE_INFO
Change Ieb746989: Removed B_MODE_INFO missed this.

Change-Id: I32202555581cc2a5d45e729c6650ada4d2df55d3
2011-06-27 09:43:21 -07:00
Johann
8a9a11e8dc Merge "configuration, support disabling any subset of ARM arch" 2011-06-27 08:55:18 -07:00
Johann
2007c3bb38 respect alignment in arm asm files
Conversion script was discarding alignment. Also, set default alignment
to 4 bytes.

Change-Id: I1e9cbbb8c142bdf93df4e9aaccf967ed43dff906
https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/789198
2011-06-27 11:38:26 -04:00
Stefan Holmer
ba0822ba96 Adding support for error concealment in multi-threaded decoding
Also includes a couple of error concealment bug fixes:
- the segment_id wasn't properly initialized when missing
- when interpolating and no neighbors are found, set to zero
- clear the qcoef buffer when concealing an MB

Change-Id: Id79c876b41d78b559a2241e9cd0fd2cae6198f49
2011-06-27 09:03:06 -04:00
Adrian Grange
deca8cfc44 Fixed initialization of frame buffer ref counters
Only the first frame buffer ref counter was being initialized
because the index was fixed at 0 rather than using i.

Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4
2011-06-24 08:43:40 -07:00
Yunqing Wang
0d87098e08 Copy macroblock data to a buffer before encoding it
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.

Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
2011-06-23 13:54:02 -04:00
John Koleszar
db67dcba6a Revert "Reduce overshoot in 1 pass rate control"
This reverts commit 212f618373.

Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.

Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5
2011-06-23 11:52:12 -04:00
John Koleszar
259ea23297 Merge "Fix parallel install" 2011-06-23 08:18:10 -07:00
John Koleszar
ac998ec8d8 Merge changes I2807b5a1,I59b020c2
* changes:
  vpxenc: add rate histogram display
  vpxenc: add quantizer histogram display
2011-06-23 08:09:10 -07:00
John Koleszar
c96f8e238d vpxenc: add rate histogram display
Add the --rate-hist=n option, which displays a histogram with n
buckets for the rate over the --buf-sz window.

Change-Id: I2807b5a1525c7972e9ba40839b37e92b23ceffaf
2011-06-23 10:22:44 -04:00
John Koleszar
3fde9964ce vpxenc: add quantizer histogram display
Add the --q-hist=n option, which displays a histogram with n buckets for
the quantizer selected on each frame.

Change-Id: I59b020c26b0acae0b938685081d9932bd98df5c9
2011-06-23 10:22:43 -04:00
James Berry
2bd90c13a0 get/set reference buffer dimension check added
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.

Change-Id: I904982313ce9352474f80de842013dcd89f48685
2011-06-22 13:36:24 -04:00
Alexis Ballier
653e69e334 Fix parallel install
Require the destination to be present before trying to create the symlink.
See: http://bugs.gentoo.org/show_bug.cgi?id=323805

Change-Id: I14ed4a9792dedc289885a9a43bc5a86cb792206d
2011-06-22 12:53:07 -04:00
Yaowu Xu
76495617e0 Merge "adjusting the calculation of errorperbit" 2011-06-21 09:47:42 -07:00
Scott LaVarnway
55c3963c88 Merge "Improved vp8dx_decode_bool" 2011-06-21 07:45:51 -07:00
Yunqing Wang
109c20299c Merge "Remove unnecessary bounds checking in motion search" 2011-06-21 07:23:24 -07:00
Attila Nagy
6f23f24afe configuration, support disabling any subset of ARM arch
Useful for leaving out any version specific asm files.

Change-Id: I233514410eb9d7ca88d2d2c839673122c507fa99
2011-06-21 10:39:01 +03:00
Yaowu Xu
10ed60dc71 adjusting the calculation of errorperbit
RDMULT/RDDIV defines a bit worth of distortion in term of sum squared
difference. This has also been used as errorperbit in subpixel motion
search, where the distortions computed as variance of the difference.
The variance of differences is different from sum squared differences
by amount of DC squared. Typically, for inter predicted MBs, this
difference averages around 10% between the two distortion, so this patch
introduces a 110% constant in deriving errorperbit from RDMULT/RDDIV.

Test on CIF set shows small but positive gain on overall PSNR (.03%)
and SSIM (.07%), overall impact on average PSNR is 0.

Change-Id: I95425f922d037b4d96083064a10c7cdd4948ee62
2011-06-20 16:32:30 -07:00
Scott LaVarnway
67a1f98c2c Improved vp8dx_decode_bool
Relocated the vp8dx_bool_decoder_fill() call, allowing
the compiler to produce better assembly code.  Tests
showed a 1 - 2 % performance boost (x86 using gcc)
for the 720p clip used.

Change-Id: Ic5a4eefed8777e6eefa007d4f12dfc7e64482732
2011-06-20 14:44:16 -04:00
Taekhyun Kim
458fb8f491 utilize preload in ARMv6 MC/LPF/Copy routines
About 9~10% decoding perf improvement on non-Neon ARM cpus

Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837
2011-06-17 14:04:53 -07:00
Yunqing Wang
2cd1c2855e Remove unnecessary bounds checking in motion search
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.

Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
2011-06-17 14:19:51 -04:00
John Koleszar
a60fc419f5 Merge "Use SSE as BPRED distortion metric consistently" 2011-06-17 09:48:32 -07:00
Ronald S. Bultje
87fd66bb0e Assign boost to GF bit allocation if past frame had no ARF.
Modify the second-pass code to provide a full golden-frame (GF) bit
allocation boost if the past GF group (GFG) had no alt-ref frame (ARF),
even if the current GFG does contain and ARF.

This mostly has no effect on clips, since switching ARFs on/off between
GFGs is not very common. Has a positive effect on e.g. cheer (+0.45 SSIM
at 600kbps) and football (+0.25 SSIM at 600kbps), particularly at high
bitrates. Has a negative effect (-0.04 SSIM at 300kbps) at pamphlet,
which appears only marginally related to this patch, and crew (-0.1 SSIM
at 700kbps).

Change-Id: I2e32899638b59f857e26efeac18a82e0c0b77089
2011-06-16 13:01:27 -04:00
John Koleszar
eb645abeac Merge "Disable specialcase for last frames if the sequence contains ARFs." 2011-06-16 09:56:05 -07:00
John Koleszar
d9959e336e Merge "gen_msvs_proj: write boolean for Debug attribute" 2011-06-16 07:19:46 -07:00
James Zern
91b167202d gen_msvs_proj: write boolean for Debug attribute
Replace =1 with =true for yasm tool element. This aids in upgrading
e.g., vs9 project files to vs10.
build/x86-msvs/yasm.xml generated during conversion will require the
Separator attribute to be removed for the build to complete
successfully.

Change-Id: If75c4f9a925529740048882003e9d766c5ac4f0c
2011-06-15 14:56:16 -07:00
John Koleszar
5223016337 Merge "Remove redundant check for KEY_FRAME in multithreaded decoder" 2011-06-15 10:18:06 -07:00
John Koleszar
61599fb59f Use SSE as BPRED distortion metric consistently
The BPRED mode selection uses SSE as a distortion metric, but the early
breakout threshold being used was a variance value.

Change-Id: I42d4602fb9b548bf681a36445701fada5e73aff1
2011-06-15 10:53:37 -04:00
John Koleszar
1ade44b352 Merge "fix --disable-runtime-cpu-detect on x86" 2011-06-15 07:09:09 -07:00
Ronald S. Bultje
299193dd1c Disable specialcase for last frames if the sequence contains ARFs.
firstpass.c contains some rate adjustment code that assures that the
last few frames in a sequence abide by rate limits. If the second-to-
last group of frames contains an alt-ref frame (ARF), the last golden
frame (GF) is zero bytes, and we will thus spend a ridiculously high
number of bits on regular P-frames trying to hit the target rate. This
does slightly enhance the quality of these last few frames, but has
no perceptual value (other than hitting the target rate).

Disabling this code means we consistently (slightly) undershoot the
target rate and consequently do worse on the last few frames of a
clip, which is particularly noticeable for small clips. The quality-
per-bitrate is generally better, ~0.2% better overall on derf-set,
especially on clips such as garden, tennis, foreman at low bitrates.
Has a negative effect on hallmonitor at high bitrates.

Change-Id: I1d63452fef5fee4a0ad2fb2e9af4c9f2e0d86d23
2011-06-15 09:47:00 -04:00
Attila Nagy
e7e5a58d0c Guard vpx_config.h against multiple inclusions
Change-Id: Iabe2be73af2b92c53687755b31b77448fba385d2
2011-06-15 13:36:09 +03:00
Attila Nagy
c7e6aabbca Remove redundant check for KEY_FRAME in multithreaded decoder
For Intra blocks is enough to check ref_frame == INTRA_FRAME.

Change-Id: I3e2d3064c7642658a9e14011a4627de58878e366
2011-06-15 09:01:27 +03:00
Scott LaVarnway
7be5b6dae4 Merge "Populate bmi for B_PRED only" 2011-06-14 12:04:50 -07:00
Johann
92b0e544f3 fix --disable-runtime-cpu-detect on x86
Change-Id: Ib8e429152c9a8b6032be22b5faac802aa8224caa
2011-06-14 11:31:50 -04:00
Paul Wilkins
bf6b314d89 Merge "Fix RT only build" 2011-06-14 06:01:44 -07:00
Tero Rintaluoma
9909047461 Fix RT only build
Moved encode_intra function from firstpass.c to encodeintra.c to
prevent linking problem in real-time only build. Also changed name
of the function to vp8_encode_intra because it is not a static.

Change-Id: Ibf3c6c1de3152567347e5fbef47d1d39564620a5
2011-06-14 13:39:06 +03:00
Tero Rintaluoma
5405bd9793 Update -linux-rvct targets
- Updated -linux-rvct targets to support RVDS 4.0 and later.
- Changed optimization flag to -Otime because -O3 ruined performance
  for RVCT linux targets.
- Added support for --enable-small for RVCT
- RVCT created library should be able to link with GCC
- Supports building shared linux libraries

Change-Id: Ic62589950d86c3420fd4d908b8efb870806d1233
2011-06-14 11:29:35 +03:00
James Zern
532c30c83e fix corrupt frame leak
If setup_token_decoder reported an internal error the memory allocated
there would not be freed in the resulting call to _remove_decompressor.

Change-Id: Ib459de222d76b1910d6f449cdcd01663447dbdf6
2011-06-13 17:32:19 -07:00
Scott LaVarnway
223d1b54cf Populate bmi for B_PRED only
Small decode performance gain (~1%) on keyframes.  No
noticeable gains on encode.  Also changed pick_intra4x4mby_modes()
to read the above and left block modes for keyframes only.

Change-Id: I1f4885252f5b3e9caf04d4e01e643960f910aba5
2011-06-13 17:14:11 -04:00
Scott LaVarnway
e71a010646 Calc ref_frame_cost once per frame
instead of every macro block.

Change-Id: I2604e94c6b89e3a8457777e21c8c38406d55b165
2011-06-13 09:58:03 -04:00
Tero Rintaluoma
66533b1a8d Fix make clean for asm offset files
Automatically created assembly offset files added to CLEAN-OBJS list
for proper cleanup. This will fix following build error:
1) Build for the workstation
./conigure
make
make clean
2) Build for ARM platform
./configure --target=armv7-linux-gcc
make ==> this will fail because it uses old asm_*_offset.asm files

Change-Id: Id5275c470390ca81b8db086a15ad75af39b80703
2011-06-10 14:05:53 +03:00
John Koleszar
f3ba4c6b82 Merge "bug fix mode_info_context not initialized for error-resilient" 2011-06-09 13:39:47 -07:00
Yaowu Xu
361717d2be remove one set of 16x16 variance funcations
call to this set of functions are replaced by var16x16.

Change-Id: I5ff1effc6c1358ea06cda1517b88ec28ef551b0d
2011-06-09 11:23:05 -07:00
James Berry
45feea4cf0 bug fix mode_info_context not initialized for error-resilient
uninitialized xd->mode_info_context would crash
vpxenc for --error-resilient=1.

Change-Id: I31849e40281e3d65ab63257cfec5e93398997f0b
2011-06-09 12:46:31 -04:00
John Koleszar
af49c11250 Update keyframe activity in non-RD mode
Activity update is no longer dependent on being in RD mode, so update
it unconditionally.

Change-Id: Ib617a6fc210dfc045455e3e4467d7ee5e3d1fa0e
2011-06-09 12:05:31 -04:00
Johann
79327be6c7 use GCC inline magic
Better fix for #326. ICC happens to support the inline magic

Change-Id: Ic367eea608c88d89475cb7b05d73500d2a1bc42b
2011-06-08 16:19:37 -04:00
Johann
baa17db184 Merge "Revert "Use shared object files for ELF"" 2011-06-08 11:47:18 -07:00
Johann
abb7c2181e Revert "Use shared object files for ELF"
Broke RVCT. New magic coming for ICC. Stay tuned!

This reverts commit c73eb2ffff
2011-06-08 11:36:04 -07:00
John Koleszar
8767ac3bc7 Merge "vp8_pick_inter_mode: remove best_bmodes" 2011-06-08 10:59:30 -07:00
John Koleszar
9e4df2bcf5 Merge "vp8_pick_intra_mode: correct returned rate" 2011-06-08 10:58:36 -07:00
John Koleszar
254a7483e5 Merge "Move RD intra block mode selection to rdopt.c" 2011-06-08 10:51:50 -07:00
John Koleszar
001bd51ceb vp8_pick_inter_mode: remove best_bmodes
Since BPRED will be tested at most once, and SPLITMV is not enabled,
there's nothing to clobber the subblock modes, so there's no need to
save and restore them.

Change-Id: I7c3615b69190c10bd068a44df5488d6e8b85a364
2011-06-08 13:50:50 -04:00
Scott LaVarnway
dce64343d6 Merge "Removed unused function parameters" 2011-06-08 10:20:28 -07:00
John Koleszar
91907e0bf4 vp8_pick_intra_mode: correct returned rate
The returned rate was always the 4x4 rate, instead of the rate
matching the selected mode.

Change-Id: I51da31f80884f5e37f3bcc77d1047d31e612ded4
2011-06-08 13:19:12 -04:00
Scott LaVarnway
69d8d386ed Removed unused function parameters
Change-Id: Ib641c624faec28ad9eb99e2b5de51ae74bbcb2a2
2011-06-08 13:01:09 -04:00
Yaowu Xu
1fba1e38ea Adjust errorperbit according to RDMULT in activity masking
In activity masking, RDO constant RDMULT is adjusted on a per MB basis
adaptive to activity with the MB. errorperbit, which is defined as
RDMULT/RDDIV, is a constant used in motion estimation. Previously, in
activity masking, errorperbit is not changed even when RDMULT is changed.
This commit changed to adjust errorperbit according to the change in
RDMULT.

Test in cif set showed a very small but consistent gain by all quality
metrics (average, overall psnr and ssim) when activity masking is on.

Change-Id: I07ded3e852919ab76757691939fe435328273823
2011-06-08 09:45:47 -07:00
Yaowu Xu
5fafa2d524 Merge "Further activity masking changes:" 2011-06-08 09:30:31 -07:00
John Koleszar
96a42aaa2d Move RD intra block mode selection to rdopt.c
This change is analogous to I0b67dae1f8a74902378da7bdf565e39ab832dda7,
which made the move for the non-RD path.

Change-Id: If63fc1b0cd1eb7f932e710f83ff24d91454f8ed1
2011-06-08 12:05:05 -04:00
John Koleszar
e90d17d240 Move intra block mode selection to pickinter.c
This commit moves the intra block mode selection from encodeframe.c
to pickinter.c (in the non-RD case). This allowed pick_intra_mbuv_mode
and pick_intra4x4mby_modes to be made static, and is a step towards
refactoring intra mode selection in the main pickinter loop. Gave a
small perf increase (~0.5%).

Change-Id: I0b67dae1f8a74902378da7bdf565e39ab832dda7
2011-06-08 11:44:57 -04:00
Paul Wilkins
4e81a68af7 Further activity masking changes:
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target  rather
than a frame average, for altering rd and zbin.

Overall the SSIM performance is similar  to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%

Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
2011-06-08 16:03:37 +01:00
Yaowu Xu
7368dd4f8f Merge "remove redundant functions" 2011-06-07 16:36:37 -07:00
Yaowu Xu
59129afc05 Merge "adjust sad per bit constants" 2011-06-07 12:37:04 -07:00
Yaowu Xu
221e00eaa9 adjust sad per bit constants
While investigating the effect of DC values on SAD and SSE in motion
estimation, a side finding indicates the two table of constants need
be adjusted. The adjustment was done by multiplying old constants by
90% with rounding. Also absorb the 1/2 scaling constant into the two
tables. Refer to change Ifa285c3e for background of the 1/2 factor.

Cif set test showed a very small gain on all metric.

Change-Id: I04333527a823371175dd46cb04a817e5b9a8b752
2011-06-07 12:35:03 -07:00
John Koleszar
5c166470a5 Merge "Reduce overshoot in 1 pass rate control" 2011-06-07 12:30:37 -07:00
Scott LaVarnway
346358a5b7 Merge "Wrapped asserts in critical code with CONFIG_DEBUG" 2011-06-07 06:53:51 -07:00
Scott LaVarnway
afb84bb1cc Merge "Removed unused function vp8_treed_read_num" 2011-06-07 06:51:24 -07:00
John Koleszar
8c2ee4273c Merge "Use shared object files for ELF" 2011-06-07 06:44:32 -07:00
Scott LaVarnway
0e3bcc6f32 Wrapped asserts in critical code with CONFIG_DEBUG
Change-Id: I5b0aaca06f2e0f40588cb24fb0642b6865da8970
2011-06-07 09:34:47 -04:00
Scott LaVarnway
1374a4db3b Removed unused function vp8_treed_read_num
Change-Id: Id66e70540ee7345876f099139887c1843093907f
2011-06-07 09:32:51 -04:00
Yaowu Xu
d4700731ca remove redundant functions
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
    variance == sse - sum*sum/256

Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
2011-06-06 16:44:05 -07:00
Yunqing Wang
03973017a7 Remove hex search's variance calculation while in real-time mode
In real-time mode motion search, there is no need to calculate
variance. This change improved encoding speed by 1% ~ 2%(speed=-5).

Change-Id: I65b874901eb599ac38fe8cf9cad898c14138d431
2011-06-06 19:11:05 -04:00
Johann
04edde2b11 Merge "neon fast quantize block pair" 2011-06-06 13:42:58 -07:00
Johann
da8eb716e8 Merge "adds preload for armv6 encoder asm" 2011-06-06 13:32:13 -07:00
Johann
c73eb2ffff Use shared object files for ELF
Fixes #326

Change-Id: I5f2a4257430ef62f674190acefd43a0474821288
2011-06-06 16:10:31 -04:00
Scott LaVarnway
d1c0ba8f7a Merge "Removed unnecessary bmi motion vector stores." 2011-06-06 07:57:39 -07:00
John Koleszar
824e9410c6 Merge "Don't allow very short GF groups even when the GF is predicted from an ARF." 2011-06-06 07:02:29 -07:00
John Koleszar
212f618373 Reduce overshoot in 1 pass rate control
This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.

Tested on the CIF set over 200-500kbps using these settings:

  --buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
  --undershoot-pct=100

Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:

  --rt --cpu-used=-4

The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:

  One pass:
    baseline:   1.29715
    this patch: 1.23664

  Two pass:
    baseline:   1.32702
    this patch: 1.37824

This change had a positive effect on our quality metrics as well:

  One pass CBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -0.42 / 2.86 / 27.32
    Overall PSNR    -0.90 / 2.00 / 17.27
    SSIM            -0.05 / 3.95 / 37.46

  Two pass CBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -4.47 / 4.35 / 35.99
    Overall PSNR    -3.40 / 4.18 / 36.46
    SSIM            -4.56 / 6.98 / 53.67

  One pass VBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -5.21 /  0.01 / 3.30
    Overall PSNR    -8.10 / -0.38 / 1.21
    SSIM            -7.38 / -0.11 / 3.17
    (note: most values here were close to the mean, there were a few
     outliers on files that were very sensitive to golden frame size)

  Two pass VBR:
                    Min  / Mean / Max (pct)
    Average PSNR    0.00 / 0.00 / 0.00
    Overall PSNR    0.00 / 0.00 / 0.00
    SSIM            0.00 / 0.00 / 0.00

Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.

Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.

Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716
2011-06-03 16:38:11 -04:00
Scott LaVarnway
f1d6cc79e4 Removed unnecessary bmi motion vector stores.
left_block_mv and above_block_mv will return the MB
motion vector for non SPLITMV macro blocks.

Change-Id: I58dbd7833b4fdcd44b6b72e98ec732c93c2ce4f4
2011-06-03 13:09:46 -04:00
Scott LaVarnway
8c5b73de2a Merge "Removed B_MODE_INFO" 2011-06-03 08:32:30 -07:00
Yunqing Wang
e5c236c210 Adjust bounds checking for hex search in real-time mode
Currently, hex search couldn't guarantee the motion vector(MV)
found is within the limit of maximum MV. Therefore, very large
motion vectors resulted from big motion in the video could cause
encoding artifacts. This change adjusted hex search bounds
checking to make sure the resulted motion vector won't go out
of the range. James Berry, thank you for finding the bug.

Change-Id: If2c55edd9019e72444ad9b4b8688969eef610c55
2011-06-03 08:53:42 -04:00
Scott LaVarnway
773768ae27 Removed B_MODE_INFO
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.

Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
2011-06-02 13:46:41 -04:00
Ronald S. Bultje
9f002bee53 Don't allow very short GF groups even when the GF is predicted from an ARF.
This is basically a slightly modified version of the previous patch,
and it has a moderately positive effect (SSIM/PSNR both +0.08% avg
on derf-set). Most clips show no change, except waterfall/coastguard,
each ~ +0.8% SSIM/PSNR. You can see similar effects in other clips
by shortening their length to terminate at a very short last group
of frames.

Change-Id: I7a70de99ca1f9fe6a8b6ca7a6e30e8a4b64383e4
2011-06-02 09:14:51 -07:00
Yaowu Xu
4ce6928d5b Merge "further clean up of errorperbit and sadperbit" 2011-06-02 08:58:03 -07:00
Yaowu Xu
5b2fb32961 further clean up of errorperbit and sadperbit
this commit makes the usage errorperbit and sadperbit consistent for
encoding modes and passes. Removed all different magic weight factors
associated with errorperbit. Now 1/2 is used for both sadperbit16 and
sadperbit4, the /2 operation is merged into initializations of the 2
variables.

Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
psnr and ssim respectively.

Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
2011-06-01 14:44:06 -07:00
John Koleszar
4101b5c5ed Merge "Bugfix in vp8dx_set_reference" 2011-06-01 13:57:23 -07:00
Henrik Lundin
69ba6bd142 Bugfix in vp8dx_set_reference
The fb_idx_ref_cnt book-keeping was in error. Added an assert to
prevent future errors in the reference count vector. Also fixed a
pointer syntax error.

Change-Id: I563081090c78702d82199e407df4ecc93da6f349
2011-06-01 21:41:12 +02:00
John Koleszar
5610970fe9 Merge "Fix code under #if CONFIG_INTERNAL_STATS." 2011-06-01 11:14:17 -07:00
Ronald S. Bultje
34ba18760f Fix code under #if CONFIG_INTERNAL_STATS.
Change-Id: Iccbd78d91c3071b16fb3b2911523a22092652ecd
2011-06-01 11:10:13 -07:00
Yaowu Xu
50916c6a7d remove some magic weights associated with sad_per_bit
sad_per_bit has been used for a number of motion vector search routines
with different magic weights: 1, 1/2 and 1/4. This commit remove these
magic numbers and use 1/2 for all motion search routines, also reformat
a number of source code lines to within 80 column limit.

Test on cif set shows overall effect is neutral on all metrics. <=0.01%

Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
2011-06-01 10:10:44 -07:00
Tero Rintaluoma
61f0c090df neon fast quantize block pair
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
 - Additional 3-6% speedup compared to neon optimized fast
   quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)

Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
2011-06-01 10:48:05 +03:00
Scott LaVarnway
9e4f76c154 Merge "vp8_pick_inter_mode code cleanup" 2011-05-31 12:31:46 -07:00
Scott LaVarnway
1a5a1903ea vp8_pick_inter_mode code cleanup
Small code cleanups before attempting to reduce the size
of bmi found in BLOCKD.

Change-Id: Ie9c14adb53afd847716a75bcce067d0e6c04f225
2011-05-31 14:24:42 -04:00
John Koleszar
0a72f568ec Initialize first_time_stamp_ever
Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.

Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621
2011-05-31 12:37:45 -04:00
Tero Rintaluoma
5305e79eae adds preload for armv6 encoder asm
Added preload instructions to armv6 encoder optimizations.
About 5% average speed-up on Tegra2 for VGA@30fps sequence.

Change-Id: I41d74737720fb71ce7a316f07555357822f3347e
2011-05-30 11:10:03 +03:00
John Koleszar
4a4ade6dc8 Merge "bug fix check frame buffer index before copy" 2011-05-27 12:35:06 -07:00
James Berry
8795b52512 bug fix check frame buffer index before copy
in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy.  If two frames
share the same buffer the flags will already be
set correctly.

Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e
2011-05-27 14:59:29 -04:00
Yunqing Wang
4fb5ce6a92 Merge "Use hex search for realtime mode speed>4" 2011-05-27 11:12:50 -07:00
Yunqing Wang
4d052bdd91 Use hex search for realtime mode speed>4
Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.

Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c
2011-05-27 14:05:02 -04:00
Scott LaVarnway
ba420f1097 Merge "Broken EC after MODE_INFO size reduction" 2011-05-27 07:52:04 -07:00
Yunqing Wang
5a8cbb8955 Merge "Remove unused code" 2011-05-27 07:25:25 -07:00
Yunqing Wang
2dc24635ec Remove unused code
Hex search is not called in rdopt.c

Change-Id: I67347f03e13684147a7c77fb9e9147e440bb5e8e
2011-05-27 10:20:49 -04:00
Scott LaVarnway
4f586f7bd0 Broken EC after MODE_INFO size reduction
This patch fixes the compiler errors and the seg fault
when running decode_with_partial_drops.

Change-Id: I7c75369e2fef81d53b790d5dabc327218216838b
2011-05-26 15:13:00 -04:00
John Koleszar
1fe5070b76 Merge "Do not copy data between encoder reference buffers." 2011-05-26 09:58:26 -07:00
Yaowu Xu
9a248f1593 Merge "fix the mix use of errorperbit and sadperbit" 2011-05-26 09:39:41 -07:00
Scott LaVarnway
40b850b458 Merge "Use int_mv instead of MV in vp8_mv_cont" 2011-05-26 07:01:38 -07:00
Yaowu Xu
d8c525b8b1 fix the mix use of errorperbit and sadperbit
error_per_bit and sad_per_bit were designed as estimates of a bit worth
of sum squared error and sum absolute difference respectively. Under
this assumption, error_per_bit should be used in combination with 2nd
order errors (variance or sum squared error) while sad_per_bit should
be used in combination with 1st order SADs in motion estimation. There
were a few places where sad_per_bit has been misused with variances,
this commit changes to use error_per_bit for those places, also changes
parameter names to properly indicate which constant is being used.

On cif set, the change has a universal gain by all metrics: 0.13% by
average/overall psnr and 0.1% by ssim.

Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
2011-05-25 16:48:10 -07:00
Yunqing Wang
13b56eeb7a Merge " Use var8x8 instead of get8x8var in VP8_UVSSE" 2011-05-25 11:35:42 -07:00
Yunqing Wang
f299d628f3 Merge "Return sse value in vp8_variance SSE2 functions" 2011-05-25 11:31:07 -07:00
Yaowu Xu
22c05c0575 remove code not in use
Change-Id: I6e5e86235d341cce3b02abda26dbeb71940ed955
2011-05-25 09:46:37 -07:00
Yunqing Wang
b6679879b8 Return sse value in vp8_variance SSE2 functions
Minor modification.

Change-Id: I09511d38fd1451d5c4106a48acdb3f766ce59cb7
2011-05-25 11:55:41 -04:00
Attila Nagy
a615c40499 Use var8x8 instead of get8x8var in VP8_UVSSE
'sum' returned by get8x8var is not used and var8x8 has optimizations
  for more platforms.

Change-Id: I4a907fb1a05f285669fb0b95dc71d42182c980f6
2011-05-25 12:54:34 +03:00
Yunqing Wang
d75eb73653 Fix a bug happening while encoding at profile=3
While profile=3, there is no sub-pixel search. Distortion and SSE
have to calculated using get_inter_mbpred_error().

Change-Id: Ifb36e17eef7750af93efa7d0e2870142ef540184
2011-05-24 16:28:23 -04:00
Scott LaVarnway
a39321f37e Use int_mv instead of MV in vp8_mv_cont
Less operations.

Change-Id: Ibb9cd5ae66b8c7c681c9a654d551c8729c31c3ae
2011-05-24 16:01:12 -04:00
Scott LaVarnway
cfab2caee1 Removed unused variable warnings
Change-Id: I6e5e921f03dc15a72da89a457848d519647677a3
2011-05-24 15:17:03 -04:00
Scott LaVarnway
b5278f38b0 Merge "MODE_INFO size reduction" 2011-05-24 12:08:24 -07:00
Scott LaVarnway
e11f21af9a MODE_INFO size reduction
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions.  The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.

Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
2011-05-24 13:24:52 -04:00
John Koleszar
fbea372817 Merge "Fixing bug in VP8_SET_REFERENCE decoder control command" 2011-05-24 05:57:44 -07:00
Yunqing Wang
69aad3a720 Merge "Rewrite hex search function" 2011-05-24 05:26:16 -07:00
Henrik Lundin
a126cd1760 Fixing bug in VP8_SET_REFERENCE decoder control command
In vp8dx_set_reference, the new reference image is written to an
unused reference frame buffer.

Change-Id: I9e4f2cef5a011094bb7ce7b2719cbfe096a773e8
2011-05-24 09:03:43 +02:00
Yaowu Xu
99fb568e67 Merge "use get8x8var directly for non-subpixel motion case in VP8_UVSSE" 2011-05-23 14:49:56 -07:00
Yunqing Wang
7838f4cfff Rewrite hex search function
Reduced some bound checks in hex search function.

Change-Id: Ie5f73a6c227590341c960a74dc508cff80f8aa06
2011-05-23 16:18:52 -04:00
Yaowu Xu
ab2dfd22f3 use get8x8var directly for non-subpixel motion case in VP8_UVSSE
VP8_UVSSE mistakenly used subpixvar8x8 to calculate SSE for non-subpixl
motion cases.

Change-Id: I4a5398bb9ef39c211039f6af4540546d4972e6a9
2011-05-23 09:11:28 -07:00
John Koleszar
ad6fe4a88c Merge "bug fix active_worst_quality set below active_best_quality" 2011-05-20 11:23:10 -07:00
John Koleszar
8196cc85f8 Merge "cleanup: collect twopass variables" 2011-05-20 11:20:44 -07:00
Johann
6d82d2d22e Merge "Fixed iwalsh_neon build problems with RVDS4.1" 2011-05-20 07:51:11 -07:00
Yaowu Xu
1fbc81a970 Merge "revise two function definitions with less parameters" 2011-05-20 07:45:42 -07:00
John Koleszar
a0c11928db Merge "Remove unused members of VP8_COMP" 2011-05-20 07:39:03 -07:00
Yaowu Xu
a4c69e9a0f revise two function definitions with less parameters
Change-Id: Ia96e5bf915e4d3c0ac9c1795114bd9e5dd07327a
2011-05-19 19:06:03 -07:00
Yaowu Xu
1f3f18443d Merge "disable trellis optimization for first pass" 2011-05-19 17:25:31 -07:00
Yaowu Xu
d5b8f7860f disable trellis optimization for first pass
also remove 2 #defines and 1 function declaration that are not in use.

Change-Id: I8f743d0e3dd9ebf1de24a8b0c30ff09f29b00c53
2011-05-19 17:22:14 -07:00
James Berry
caa1b28be3 bug fix active_worst_quality set below active_best_quality
fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.

Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039
2011-05-19 18:10:31 -04:00
John Koleszar
63cb1a7ce0 cleanup: collect twopass variables
This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.

This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.

Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d
2011-05-19 17:26:09 -04:00
Scott LaVarnway
dba79821f0 Merge "Using partition_info instead of blockd info for splitmv" 2011-05-19 13:22:59 -07:00
John Koleszar
048497720c Remove unused members of VP8_COMP
Various members that were either completely unreferenced or written
and not read.

Change-Id: Ie41ebac0ff0364a76f287586e4fe09a68907806e
2011-05-19 15:49:09 -04:00
Scott LaVarnway
99b9757685 Using partition_info instead of blockd info for splitmv
The partition_info struct contains info just for SPLITMV,
so it should be used instead of BLOCKD.  Eventually, I want
to reduce the size of B_MODE_INFO struct found in BLOCKD, so
this is the first step toward that goal.
Also, since SPLITMV is not supported in vp8_pick_inter_mode(),
the unnecessary mem copies and checks were removed.  For rt
encodes, this gave a slight performance improvement.

Change-Id: I5585c98fa9d5acbde1c7e0f452a01d9ecc080574
2011-05-19 15:03:36 -04:00
Scott LaVarnway
914f7c36d7 Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3." 2011-05-19 11:22:01 -07:00
John Koleszar
c684d5e5f2 Merge "changed configure option name to reduce confusion" 2011-05-19 11:17:08 -07:00
John Koleszar
ff39958cee Merge "Make activity masking functions static" 2011-05-19 11:12:18 -07:00
John Koleszar
21ca4c4d5d Merge "Fix segv without --enable-error-concealment" 2011-05-19 10:58:24 -07:00
John Koleszar
7def902261 Fix segv without --enable-error-concealment
Missed wrapping one function call in #if CONFIG_ERROR_CONCEALMENT.

Change-Id: I5746b1e6e4531670dbed1130467331fe309bdcae
2011-05-19 13:57:45 -04:00
John Koleszar
e3081b2502 Merge "Adding error-concealment to the decoder." 2011-05-19 10:48:58 -07:00
Stefan Holmer
d04f852368 Adding error-concealment to the decoder.
The error-concealer is plugged in after any motion vectors have been
decoded. It tries to estimate any missing motion vectors from the
motion vectors of the previous frame. Intra blocks with missing
residual are replaced with inter blocks with estimated motion vectors.

This feature was developed in a separate sandbox
(sandbox/holmer/error-concealment).

Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412
2011-05-19 13:46:33 -04:00
John Koleszar
a84177b432 Make activity masking functions static
These don't need extern linkage.

Change-Id: I21220ada926380a75ff654f24df84376ccc49323
2011-05-19 11:14:13 -04:00
John Koleszar
87254e0b7b Move quantizer init functions to quantize.c
Group related functions together.

Change-Id: I92fd779225b75a7204650f1decb713142c655d71
2011-05-19 11:07:41 -04:00
Attila Nagy
f96d56c4aa Fixed iwalsh_neon build problems with RVDS4.1
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.

Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
2011-05-19 10:27:26 +03:00
Yunqing Wang
00a1e2f8e4 Merge "Modify MVcount in pick_inter_mode to eliminate calling of vp8_find_near_mvs" 2011-05-18 12:53:27 -07:00
Yunqing Wang
9c62f94129 Fix a bug in vp8_clamp_mv function
Scott fixed the bug in MV clamping function in encoder, which
could cause artifacts.

Change-Id: Id05f2794c43c31cdd45e66179c8811f3ee452cb9
2011-05-18 09:52:56 -04:00
Yunqing Wang
f62b33f140 Modify MVcount in pick_inter_mode to eliminate calling of vp8_find_near_mvs
Moved MVcount modification in pick_inter_mode, and eliminated
calling of vp8_find_near_mvs.

Change-Id: Icd47448a1dfc8fdf526f86757d0e5a7f218cb5e8
2011-05-17 10:59:42 -04:00
John Koleszar
eafdc5e10a Merge "Improve framerate adaptation" 2011-05-13 11:18:42 -07:00
Yaowu Xu
5608c14020 Merge "adjusting rd constant slightly by ~10%" 2011-05-13 09:28:26 -07:00
Paul Wilkins
0e86235265 Merge "Restructure of activity masking code." 2011-05-13 09:23:50 -07:00
Paul Wilkins
ff52bf3691 Restructure of activity masking code.
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages

It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.

Change-Id: Id39b19eace37574dc429f25aae810c203709629b
2011-05-13 10:37:50 +01:00
John Koleszar
5ed116e220 Improve framerate adaptation
This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)

Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e
2011-05-12 15:07:50 -04:00
Scott LaVarnway
71a7501bcf Removed mv_bits_sadcost
This sad cost is being generated but never used.

Change-Id: I562eebdcb792b743770954feca365b5b37491ecd
2011-05-12 11:20:41 -04:00
Scott LaVarnway
6b25501bf1 Using int_mv instead of MV
The compiler produces better assembly when using int_mv
for assignments.  The compiler shifts and ors the two 16bit
values when assigning MV.

Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
2011-05-12 11:08:16 -04:00
Yunqing Wang
6ed81fa5b3 Merge "Modification and issue fix in full-pixel refining search" 2011-05-12 07:20:44 -07:00
Yunqing Wang
b4da1f83e6 Modification and issue fix in full-pixel refining search
Further modification and wrong implementation fix which caused
refining_search and refining_searchx4 result mismatching.

Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
2011-05-12 10:18:40 -04:00
Yaowu Xu
bd9d890605 adjusting rd constant slightly by ~10%
This is to reflect the RD improvement in the encoder. The change has a
small positive impact on quality (0.25% by VPXSSIM and 0.05% by PSNR)

Change-Id: Ic66ffc19b10870645088c0624c85556f009fd210
2011-05-11 23:32:06 -07:00
Yaowu Xu
ba6f60dba7 Merge "remove a variable no longer in use" 2011-05-10 20:20:59 -07:00
Yaowu Xu
1bcf4e66bb Merge "fix a bug related to gf_active_flags in multi-threaded encoder" 2011-05-10 19:59:52 -07:00
Yaowu Xu
f7cf439b34 remove a variable no longer in use
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.

Change-Id: I7420522ec7723f38cf77033466c25afb405d52ae
2011-05-10 19:57:51 -07:00
John Koleszar
814532a33c Merge "Use stdint.h for VS2010" 2011-05-10 18:53:04 -07:00
Johann
df2023a6cb set up Global Offset Table in recon
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.

Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
2011-05-10 15:58:56 -04:00
Yunqing Wang
c7a56f677d Merge "Use diamond search to replace full search in full-pixel refining search" 2011-05-10 06:59:38 -07:00
Yunqing Wang
cb7b1fb144 Use diamond search to replace full search in full-pixel refining search
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.

Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.

Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
2011-05-09 14:07:06 -04:00
Johann
a7d4d3c550 clean up unused variable warnings
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-05-09 12:56:20 -04:00
Yaowu Xu
89c6017cc0 fix a bug related to gf_active_flags in multi-threaded encoder
Paul pointed out that the pointer to the gf_active_flags is not being
properly incremented in multithreaded encoder. This commit fixes the
issue by making sure the gf_active_ptr points to the starting of next
group of mb rows.

Change-Id: I3246e657d23beabb614dfb880733a68a5fd7e34c
2011-05-06 09:00:44 -07:00
John Koleszar
5c756005aa Merge "Don't override active_worst_quality in 2 pass" 2011-05-06 08:59:05 -07:00
Johann
52490354f3 Merge "neon fast quantizer updated" 2011-05-06 08:54:14 -07:00
John Koleszar
abc9958c52 Don't override active_worst_quality in 2 pass
Commit db5057c introduced a bug in that the active_worst_quality
selected by the 2 pass rate controller was being overridden for key
frames, causing a severe quality loss.

Change-Id: I4865a6fbe3e94e9b4fb9271c7dd68b455d7b371d
2011-05-06 11:48:53 -04:00
John Koleszar
4ead98fa84 Use stdint.h for VS2010
VS2010 has included stdint.h, but not inttypes.h. Prefer the compiler's
version of these types. Fixes issue 327.

Change-Id: Ica71600e06b8e94e3bbb4f12988b4a9817d5e5e4
2011-05-06 08:02:39 -04:00
Tero Rintaluoma
33fa7c4ebe neon fast quantizer updated
vp8_fast_quantize_b_neon function updated and further optimized.
 - match current C implementation of fast quantizer
 - updated to use asm_enc_offsets for structure members
 - updated ads2gas scripts to handle alignment issues

Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
2011-05-06 08:59:52 +03:00
Aron Rosenberg
eeb8117303 Fix semaphore emulation on Windows
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.

This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.

Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
2011-05-06 00:13:59 -04:00
Yunqing Wang
eb16f00cf2 Fix rare hang in multi-thread encoder on Windows
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.

Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
2011-05-05 10:42:29 -04:00
Johann
ca5c1b17a2 Merge "Loopfilter NEON: Use VMOV for constant vectors instead of VLD." 2011-05-05 06:16:21 -07:00
Yunqing Wang
aeb86d615c Merge "Runtime detection of available processor cores." 2011-05-05 04:59:54 -07:00
Attila Nagy
a6aa389d2f Loopfilter NEON: Use VMOV for constant vectors instead of VLD.
Change-Id: I562b6e01c32bb51d00f3b95faf757fc7dc29a3a3
2011-05-04 11:29:23 +03:00
Yunqing Wang
3fbade23a2 Merge "Modify HEX search" 2011-05-03 11:59:32 -07:00
Yunqing Wang
04ec930abc Modify HEX search
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.

Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).

Will continue to improve it.

Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
2011-05-03 14:26:33 -04:00
Yaowu Xu
e9465daee3 Merge "change to use fast ssim code for internal ssim calculations" 2011-05-03 11:20:52 -07:00
Yaowu Xu
6c565fada0 change to use fast ssim code for internal ssim calculations
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged

Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
2011-05-03 08:36:17 -07:00
Ronald S. Bultje
bbf890fe27 build: change LDFLAGS/CFLAGS ordering.
Always use CFLAGS/LDFLAGS that point to headers and libvpx.a inside our
build tree before ones from the environment, which could reference
headers or libs outside the build tree.

This fixes issue 307.

Change-Id: I34d176b8c21098f6da5ea71f0147d3c49283cc45
2011-05-02 13:56:41 -04:00
John Koleszar
c09d8c1419 Merge "Fix documentation typos" 2011-05-02 06:50:22 -07:00
John Koleszar
a66d8d33dd Fix compile error with --enable-postproc-visualizer
Typo.

Change-Id: I9cc6a4587c3d93c9f0da5e101d376741fc9622a4
2011-05-02 09:28:37 -04:00
Thijs Vermeir
8942f70cdf Fix documentation typos
Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce
2011-04-30 09:34:59 +02:00
Ronald S. Bultje
5a23352c03 Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035
2011-04-29 11:52:09 -07:00
Yaowu Xu
57ad189129 changed configure option name to reduce confusion
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35

Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-29 09:39:05 -07:00
Yunqing Wang
dfa9e2c5ea Merge "Use insertion sort instead of quick sort" 2011-04-29 08:27:58 -07:00
Scott LaVarnway
1b2abc5f49 Merge "Consolidated build inter predictors" 2011-04-29 07:13:49 -07:00
James Berry
f10732554b bug fix removed inline from recon_wrapper_sse2.c
removed inline from recon_wrapper_sse2.c to build
for visual stuido

Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22
2011-04-28 15:12:00 -04:00
James Berry
5db296dd70 bug fix 32 bit matches 64 bit
included vpx_config.h in vpx_encoder.c
to properly define FLOATING_POINT_INIT()

Change-Id: Ie518bf5c087622658e37fca90aa4ddfe79d053f6
2011-04-28 14:11:32 -04:00
Scott LaVarnway
219ba87a93 Merge "Use psadbw to get the sum of bytes in a line." 2011-04-28 07:58:20 -07:00
Scott LaVarnway
ccd6f7ed77 Consolidated build inter predictors
Code cleanup.

Change-Id: Ic8b0167851116c64ddf08e8a3d302fb09ab61146
2011-04-28 10:53:59 -04:00
Ronald S. Bultje
1e7ded69cf Use psadbw to get the sum of bytes in a line.
Thanks Jason for pointing that out on #vp8. ;-).

Change-Id: I5330a753e752a8704b78a409597472628e0b26a5
2011-04-27 13:49:21 -07:00
Scott LaVarnway
2e102855f4 Removed unused code in reconinter
The skip flag is never set by the encoder for SPLITMV.

Change-Id: I5ae6457edb3a1193cb5b05a6d61772c13b1dc506
2011-04-27 15:25:32 -04:00
John Koleszar
085fb4b737 Merge "SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}()." 2011-04-27 12:02:55 -07:00
Ronald S. Bultje
1083fe4999 SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
decoding

before
10.425
10.432
10.423
=10.426

after:
10.405
10.416
10.398
=10.406, 0.2% faster

encoding

before
14.252
14.331
14.250
14.223
14.241
14.220
14.221
=14.248

after
14.095
14.090
14.085
14.095
14.064
14.081
14.089
=14.086, 1.1% faster

Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-27 11:31:27 -07:00
Fritz Koenig
00fdb135a7 vpxenc: remove duplicate --fps from vpxenc usage message
Fixes issue #323

Change-Id: I41c297df37afe186a8425ed2e2a95032069dcb9a
2011-04-27 11:27:59 -07:00
Yunqing Wang
5abafcc381 Use insertion sort instead of quick sort
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.

Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
2011-04-27 13:53:28 -04:00
John Koleszar
4226f0ce64 vpxdec: test for frame corruption
This change simply exercises the VP8D_GET_FRAME_CORRUPTED control,
outputting a warning message at the end if the bit was set for any
frames. Should never produce any output for good input.

Change-Id: Idaf6ba8f53660f47763cd563fa1485938580a37d
2011-04-27 12:04:48 -04:00
John Koleszar
64355ecad3 Merge "Speed up VP8DX_BOOL_DECODER_FILL" 2011-04-27 09:03:45 -07:00
John Koleszar
f8ffecb176 Merge "Update VP8DX_BOOL_DECODER_FILL to better detect EOS" 2011-04-27 09:03:24 -07:00
John Koleszar
5e1fd41357 Speed up VP8DX_BOOL_DECODER_FILL
The end-of-buffer check is hoisted out of the inner loop. Gives
about 0.5% improvement on x86_64.

Change-Id: I8e3ed08af7d33468c5c749af36c2dfa19677f971
2011-04-27 10:25:03 -04:00
John Koleszar
9594370e0c Update VP8DX_BOOL_DECODER_FILL to better detect EOS
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.

Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
2011-04-27 10:24:39 -04:00
John Koleszar
db5057c742 Refactor calc_iframe_target_size
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.

Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
2011-04-26 16:55:35 -04:00
John Koleszar
81d2206ff8 Move pick_frame_size() to ratectrl.c
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.

Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
2011-04-26 16:49:54 -04:00
Scott LaVarnway
0da77a840b Merge "Test vector mismatch fix" 2011-04-26 10:12:37 -07:00
Scott LaVarnway
7a2b9c50a3 Test vector mismatch fix
Fixed test vector mismatch that was introduced
in the "Removed dc_diff from MB_MODE_INFO"
(Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931)

Change-Id: I98fa509b418e757b5cdc4baa71202f4168dc14ec
2011-04-26 09:37:19 -04:00
Johann
d5c46bdfc0 Merge "remove simpler_lpf" 2011-04-25 14:51:07 -07:00
Johann
01527e743f remove simpler_lpf
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers

stop tracking the option in two places. use filter_type exclusively

Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25 17:37:41 -04:00
John Koleszar
fd6da3b2e7 Fix duplicate vp8_compute_frame_size_bounds
Likely introduced by a bad automatic merge from gerrit.

Change-Id: I0c6dd6ec18809cf9492f524d283fa4a3a8f4088b
2011-04-25 14:30:57 -04:00
John Koleszar
1f32b1489c Merge "Remove unused functions" 2011-04-25 11:05:00 -07:00
John Koleszar
47bc1c7013 Remove unused functions
Remove estimate_min_frame_size() and calc_low_ss_err(), as they are
never referenced.

Change-Id: I3293363c14ef70b79c4678ca27aa65b345077726
2011-04-25 13:54:23 -04:00
John Koleszar
cfbfd39de8 Merge "Change rc undershoot/overshoot semantics" 2011-04-25 10:49:32 -07:00
John Koleszar
ef86bad0d1 Merge "Stereo 3D format support for vpxenc" 2011-04-25 10:48:44 -07:00
John Koleszar
76557e34d2 Merge "Limit size of initial keyframe in one-pass." 2011-04-25 10:48:13 -07:00
John Koleszar
d9f898ab6d Merge "Add rc_max_intra_bitrate_pct control" 2011-04-25 10:47:57 -07:00
John Koleszar
454cbc96b7 Limit size of initial keyframe in one-pass.
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.

This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.

Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
2011-04-25 13:47:20 -04:00
John Koleszar
aa926fbd27 Add rc_max_intra_bitrate_pct control
Adds a control to limit the maximum size of a keyframe, as a function of
the per-frame bitrate. See this thread[1] for more detailed discussion:

[1]: http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/271b944a5e47ca38

Change-Id: I7337707642eb8041d1e593efc2edfdf66db02a94
2011-04-25 13:47:14 -04:00
John Koleszar
2089b2cee5 Merge "bug fix possible keyframe context divide by zero" 2011-04-25 09:35:12 -07:00
James Berry
8d5ce819dd bug fix possible keyframe context divide by zero
vp8_adjust_key_frame_context() divides by
estimate_keyframe_frequency() which can
return 0 in the case where --kf-max-dist=0.

Change-Id: Idfc59653478a0073187cd2aa420e98a321103daa
2011-04-25 12:16:36 -04:00
Alok Ahuja
72c76ca256 Stereo 3D format support for vpxenc
Create a new input parameter to allow specifying
the packed frame stereo 3d format. A default value
of mono will be written in the absence of user
specified input

Change-Id: I576d9952ab5d7e2076fbf1b282016a9a1baaa103
2011-04-25 07:21:51 -07:00
Johann
aeca599087 Merge "keep values in registers during quantization" 2011-04-25 06:52:38 -07:00
Scott LaVarnway
c36b6d4d01 Merge "Removed unnecessary frame type checks" 2011-04-25 06:45:43 -07:00
Scott LaVarnway
5b67329747 Merge "Removed dc_diff from MB_MODE_INFO" 2011-04-25 06:45:32 -07:00
Yaowu Xu
373dcec57a Merge "make two compiler options explicit for Visual Studio projects" 2011-04-22 14:08:08 -07:00
Ronald S. Bultje
496bcbb0de Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.

Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
2011-04-22 10:00:38 -04:00
John Koleszar
73c3d32705 Merge "Remove unused kf rate variables" 2011-04-21 16:54:14 -07:00
Adrian Grange
d2a6eb4b1e Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.

Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
2011-04-21 15:45:57 -07:00
Yaowu Xu
ddb6edd831 make two compiler options explicit for Visual Studio projects
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".

Change-Id: I0bf8343d9ca195851332b91ec69c69ee4e31ce2a
2011-04-21 13:27:42 -07:00
Johann
508ae1b3d5 keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.

Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
2011-04-21 15:47:55 -04:00
Scott LaVarnway
6f6cd3abb9 Removed unnecessary frame type checks
ref_frame is set to INTRA_FRAME for keyframes.  The B_PRED
mode is only used in intra frames.

Change-Id: I9bac8bec7c736300d47994f3cb570329edf11ec0
2011-04-21 14:59:42 -04:00
Scott LaVarnway
3698c1f620 Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering.  Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.

Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
2011-04-21 14:38:36 -04:00
Scott LaVarnway
7a49accd0b Removed force_no_skip
force_no_skip is always set to zero.

Change-Id: I89b61c5e0bee34627a9c07c05f3517e1db76af77
2011-04-20 15:45:12 -04:00
Scott LaVarnway
09c933ea80 Removed redundant checks of the mode_info_context flags
Code cleanup.  The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.

Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
2011-04-20 14:06:40 -04:00
Attila Nagy
43464e94ed Do not copy data between encoder reference buffers.
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.

Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
2011-04-20 15:26:55 +03:00
John Koleszar
ad6a8ca58b Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.

Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.

Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
2011-04-19 16:14:57 -04:00
Johann
4a2b684ef4 modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.

Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19 10:42:45 -04:00
Johann
a9b465c5c9 Merge "Add save/restore xmm registers in x86 assembly code" 2011-04-19 06:32:10 -07:00
Johann
c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yunqing Wang
48438d6016 Merge "Use sub-pixel search's SSE in mode selection" 2011-04-18 13:20:04 -07:00
Yunqing Wang
b8f0b59985 Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.

Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
2011-04-18 16:12:28 -04:00
Yunqing Wang
d5069b5af0 Merge "Handle long delay between video frames in multi-thread decoder(issue 312)" 2011-04-18 10:11:41 -07:00
Yunqing Wang
8ba58951e9 Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.

This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."

Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().

Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
2011-04-15 17:27:26 -04:00
John Koleszar
c99f9d7abf Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.

This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.

Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
2011-04-12 20:49:33 -04:00
Attila Nagy
297b27655e Runtime detection of available processor cores.
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.

Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.

Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
2011-03-31 10:23:01 +03:00
249 changed files with 11900 additions and 15966 deletions

6
.gitignore vendored
View File

@@ -60,9 +60,3 @@
/vpx_config.h
/vpx_version.h
TAGS
vpxdec
vpxenc
.project
.cproject
*.csv
*.oclpj

View File

@@ -2,3 +2,4 @@ Adrian Grange <agrange@google.com>
Johann Koenig <johannkoenig@google.com>
Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
Tom Finegan <tomfinegan@google.com>
Ralph Giles <giles@xiph.org> <giles@entropywave.com>

12
AUTHORS
View File

@@ -4,8 +4,11 @@
Aaron Watry <awatry@gmail.com>
Adrian Grange <agrange@google.com>
Alex Converse <alex.converse@gmail.com>
Alexis Ballier <aballier@gentoo.org>
Alok Ahuja <waveletcoeff@gmail.com>
Andoni Morales Alastruey <ylatuya@gmail.com>
Andres Mejia <mcitadel@gmail.com>
Aron Rosenberg <arosenberg@logitech.com>
Attila Nagy <attilanagy@google.com>
Fabio Pedretti <fabio.ped@libero.it>
Frank Galligan <fgalligan@google.com>
@@ -22,20 +25,29 @@ Jeff Muizelaar <jmuizelaar@mozilla.com>
Jim Bankoski <jimbankoski@google.com>
Johann Koenig <johannkoenig@google.com>
John Koleszar <jkoleszar@google.com>
Joshua Bleecher Snyder <josh@treelinelabs.com>
Justin Clift <justin@salasaga.org>
Justin Lebar <justin.lebar@gmail.com>
Lou Quillio <louquillio@google.com>
Luca Barbato <lu_zero@gentoo.org>
Makoto Kato <makoto.kt@gmail.com>
Martin Ettl <ettl.martin78@googlemail.com>
Michael Kohler <michaelkohler@live.com>
Mike Hommey <mhommey@mozilla.com>
Mikhal Shemer <mikhal@google.com>
Pascal Massimino <pascal.massimino@gmail.com>
Patrik Westin <patrik.westin@gmail.com>
Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk>
Philip Jägenstedt <philipj@opera.com>
Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
Ralph Giles <giles@xiph.org>
Ronald S. Bultje <rbultje@google.com>
Scott LaVarnway <slavarnway@google.com>
Stefan Holmer <holmer@google.com>
Taekhyun Kim <takim@nvidia.com>
Tero Rintaluoma <teror@google.com>
Thijs Vermeir <thijsvermeir@gmail.com>
Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com>
Yaowu Xu <yaowu@google.com>

View File

@@ -1,3 +1,85 @@
2011-08-02 v0.9.7 "Cayuga"
Our third named release, focused on a faster, higher quality, encoder.
- Upgrading:
This release is backwards compatible with Aylesbury (v0.9.5) and
Bali (v0.9.6). Users of older releases should refer to the Upgrading
notes in this document for that release.
- Enhancements:
Stereo 3D format support for vpxenc
Runtime detection of available processor cores.
Allow specifying --end-usage by enum name
vpxdec: test for frame corruption
vpxenc: add quantizer histogram display
vpxenc: add rate histogram display
Set VPX_FRAME_IS_DROPPABLE
update configure for ios sdk 4.3
Avoid text relocations in ARM vp8 decoder
Generate a vpx.pc file for pkg-config.
New ways of passing encoded data between encoder and decoder.
- Speed:
This release includes across-the-board speed improvements to the
encoder. On x86, these measure at approximately 11.5% in Best mode,
21.5% in Good mode (speed 0), and 22.5% in Realtime mode (speed 6).
On ARM Cortex A9 with Neon extensions, real-time encoding of video
telephony content is 35% faster than Bali on single core and 48%
faster on multi-core. On the NVidia Tegra2 platform, real time
encoding is 40% faster than Bali.
Decoder speed was not a priority for this release, but improved
approximately 8.4% on x86.
Reduce motion vector search on alt-ref frame.
Encoder loopfilter running in its own thread
Reworked loopfilter to precalculate more parameters
SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Removed redundant checks
Reduced structure sizes
utilize preload in ARMv6 MC/LPF/Copy routines
ARM optimized quantization, dfct, variance, subtract
Increase chrow row alignment to 16 bytes.
disable trellis optimization for first pass
Write SSSE3 sub-pixel filter function
Improve SSE2 half-pixel filter funtions
Add vp8_sub_pixel_variance16x8_ssse3 function
Reduce unnecessary distortion computation
Use diamond search to replace full search
Preload reference area in sub-pixel motion search (real-time mode)
- Quality:
This release focused primarily on one-pass use cases, including
video conferencing. Low latency data rate control was significantly
improved, improving streamability over bandwidth constrained links.
Added support for error concealment, allowing frames to maintain
visual quality in the presence of substantial packet loss.
Add rc_max_intra_bitrate_pct control
Limit size of initial keyframe in one-pass.
Improve framerate adaptation
Improved 1-pass CBR rate control
Improved KF insertion after fades to still.
Improved key frame detection.
Improved activity masking (lower PSNR impact for same SSIM boost)
Improved interaction between GF and ARFs
Adding error-concealment to the decoder.
Adding support for independent partitions
Adjusted rate-distortion constants
- Bug Fixes:
Removed firstpass motion map
Fix parallel make install
Fix multithreaded encoding for 1 MB wide frame
Fixed iwalsh_neon build problems with RVDS4.1
Fix semaphore emulation, spin-wait intrinsics on Windows
Fix build with xcode4 and simplify GLOBAL.
Mark ARM asm objects as allowing a non-executable stack.
Fix vpxenc encoding incorrect webm file header on big endian
2011-03-07 v0.9.6 "Bali"
Our second named release, focused on a faster, higher quality, encoder.

View File

@@ -98,11 +98,11 @@ install::
$(BUILD_PFX)%.c.d: %.c
$(if $(quiet),@echo " [DEP] $@")
$(qexec)mkdir -p $(dir $@)
$(qexec)$(CC) $(CFLAGS) -M $< | $(fmt_deps) > $@
$(qexec)$(CC) $(INTERNAL_CFLAGS) $(CFLAGS) -M $< | $(fmt_deps) > $@
$(BUILD_PFX)%.c.o: %.c
$(if $(quiet),@echo " [CC] $@")
$(qexec)$(CC) $(CFLAGS) -c -o $@ $<
$(qexec)$(CC) $(INTERNAL_CFLAGS) $(CFLAGS) -c -o $@ $<
$(BUILD_PFX)%.asm.d: %.asm
$(if $(quiet),@echo " [DEP] $@")
@@ -124,6 +124,12 @@ $(BUILD_PFX)%.s.o: %.s
$(if $(quiet),@echo " [AS] $@")
$(qexec)$(AS) $(ASFLAGS) -o $@ $<
.PRECIOUS: %.c.S
%.c.S: CFLAGS += -DINLINE_ASM
$(BUILD_PFX)%.c.S: %.c
$(if $(quiet),@echo " [GEN] $@")
$(qexec)$(CC) -S $(CFLAGS) -o $@ $<
.PRECIOUS: %.asm.s
$(BUILD_PFX)%.asm.s: %.asm
$(if $(quiet),@echo " [ASM CONVERSION] $@")
@@ -188,7 +194,7 @@ define linker_template
$(1): $(filter-out -%,$(2))
$(1):
$(if $(quiet),@echo " [LD] $$@")
$(qexec)$$(LD) $$(strip $$(LDFLAGS) -o $$@ $(2) $(3) $$(extralibs))
$(qexec)$$(LD) $$(strip $$(INTERNAL_LDFLAGS) $$(LDFLAGS) -o $$@ $(2) $(3) $$(extralibs))
endef
# make-3.80 has a bug with expanding large input strings to the eval function,
# which was triggered in some cases by the following component of
@@ -330,6 +336,7 @@ ifneq ($(call enabled,DIST-SRCS),)
DIST-SRCS-$(CONFIG_MSVS) += build/make/gen_msvs_proj.sh
DIST-SRCS-$(CONFIG_MSVS) += build/make/gen_msvs_sln.sh
DIST-SRCS-$(CONFIG_MSVS) += build/x86-msvs/yasm.rules
DIST-SRCS-$(CONFIG_MSVS) += build/x86-msvs/obj_int_extract.bat
DIST-SRCS-$(CONFIG_RVCT) += build/make/armlink_adapter.sh
# Include obj_int_extract if we use offsets from asm_*_offsets
DIST-SRCS-$(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64) += build/make/obj_int_extract.c

View File

@@ -21,8 +21,14 @@ print "@ This file was created from a .asm file\n";
print "@ using the ads2gas.pl script.\n";
print "\t.equ DO1STROUNDING, 0\n";
# Stack of procedure names.
@proc_stack = ();
while (<STDIN>)
{
# Load and store alignment
s/@/,:/g;
# Comment character
s/;/@/g;
@@ -79,7 +85,10 @@ while (<STDIN>)
s/CODE([0-9][0-9])/.code $1/;
# No AREA required
s/^\s*AREA.*$/.text/;
# But ALIGNs in AREA must be obeyed
s/^\s*AREA.*ALIGN=([0-9])$/.text\n.p2align $1/;
# If no ALIGN, strip the AREA and align to 4 bytes
s/^\s*AREA.*$/.text\n.p2align 2/;
# DCD to .word
# This one is for incoming symbols
@@ -114,8 +123,8 @@ while (<STDIN>)
# put the colon at the end of the line in the macro
s/^([a-zA-Z_0-9\$]+)/$1:/ if !/EQU/;
# Strip ALIGN
s/\sALIGN/@ ALIGN/g;
# ALIGN directive
s/ALIGN/.balign/g;
# Strip ARM
s/\sARM/@ ARM/g;
@@ -127,9 +136,23 @@ while (<STDIN>)
# Strip PRESERVE8
s/\sPRESERVE8/@ PRESERVE8/g;
# Strip PROC and ENDPROC
s/\sPROC/@/g;
s/\sENDP/@/g;
# Use PROC and ENDP to give the symbols a .size directive.
# This makes them show up properly in debugging tools like gdb and valgrind.
if (/\bPROC\b/)
{
my $proc;
/^_([\.0-9A-Z_a-z]\w+)\b/;
$proc = $1;
push(@proc_stack, $proc) if ($proc);
s/\bPROC\b/@ $&/;
}
if (/\bENDP\b/)
{
my $proc;
s/\bENDP\b/@ $&/;
$proc = pop(@proc_stack);
$_ = "\t.size $proc, .-$proc".$_ if ($proc);
}
# EQU directive
s/(.*)EQU(.*)/.equ $1, $2/;
@@ -148,3 +171,6 @@ while (<STDIN>)
next if /^\s*END\s*$/;
print;
}
# Mark that this object doesn't need an executable stack.
printf ("\t.section\t.note.GNU-stack,\"\",\%\%progbits\n");

View File

@@ -41,6 +41,9 @@ sub trim($)
while (<STDIN>)
{
# Load and store alignment
s/@/,:/g;
# Comment character
s/;/@/g;
@@ -97,7 +100,10 @@ while (<STDIN>)
s/CODE([0-9][0-9])/.code $1/;
# No AREA required
s/^\s*AREA.*$/.text/;
# But ALIGNs in AREA must be obeyed
s/^\s*AREA.*ALIGN=([0-9])$/.text\n.p2align $1/;
# If no ALIGN, strip the AREA and align to 4 bytes
s/^\s*AREA.*$/.text\n.p2align 2/;
# DCD to .word
# This one is for incoming symbols
@@ -137,8 +143,8 @@ while (<STDIN>)
# put the colon at the end of the line in the macro
s/^([a-zA-Z_0-9\$]+)/$1:/ if !/EQU/;
# Strip ALIGN
s/\sALIGN/@ ALIGN/g;
# ALIGN directive
s/ALIGN/.balign/g;
# Strip ARM
s/\sARM/@ ARM/g;

View File

@@ -412,11 +412,14 @@ EOF
write_common_target_config_h() {
cat > ${TMP_H} << EOF
/* This file automatically generated by configure. Do not edit! */
#ifndef VPX_CONFIG_H
#define VPX_CONFIG_H
#define RESTRICT ${RESTRICT}
EOF
print_config_h ARCH "${TMP_H}" ${ARCH_LIST}
print_config_h HAVE "${TMP_H}" ${HAVE_LIST}
print_config_h CONFIG "${TMP_H}" ${CONFIG_LIST}
echo "#endif /* VPX_CONFIG_H */" >> ${TMP_H}
mkdir -p `dirname "$1"`
cmp "$1" ${TMP_H} >/dev/null 2>&1 || mv ${TMP_H} "$1"
}
@@ -626,7 +629,7 @@ process_common_toolchain() {
case ${toolchain} in
sparc-solaris-*)
add_extralibs -lposix4
add_cflags "-DMUST_BE_ALIGNED"
disable fast_unaligned
;;
*-solaris-*)
add_extralibs -lposix4
@@ -639,8 +642,8 @@ process_common_toolchain() {
# on arm, isa versions are supersets
enabled armv7a && soft_enable armv7 ### DEBUG
enabled armv7 && soft_enable armv6
enabled armv6 && soft_enable armv5te
enabled armv6 && soft_enable fast_unaligned
enabled armv7 || enabled armv6 && soft_enable armv5te
enabled armv7 || enabled armv6 && soft_enable fast_unaligned
enabled iwmmxt2 && soft_enable iwmmxt
enabled iwmmxt && soft_enable armv5te
@@ -689,7 +692,7 @@ process_common_toolchain() {
if enabled armv7
then
check_add_cflags --cpu=Cortex-A8 --fpu=softvfp+vfpv3
check_add_asflags --cpu=Cortex-A8 --fpu=none
check_add_asflags --cpu=Cortex-A8 --fpu=softvfp+vfpv3
else
check_add_cflags --cpu=${tgt_isa##armv}
check_add_asflags --cpu=${tgt_isa##armv}
@@ -751,41 +754,24 @@ process_common_toolchain() {
linux*)
enable linux
if enabled rvct; then
# Compiling with RVCT requires an alternate libc (glibc) when
# targetting linux.
disabled builtin_libc \
|| die "Must supply --libc when targetting *-linux-rvct"
# Check if we have CodeSourcery GCC in PATH. Needed for
# libraries
hash arm-none-linux-gnueabi-gcc 2>&- || \
die "Couldn't find CodeSourcery GCC from PATH"
# Set up compiler
add_cflags --library_interface=aeabi_glibc
add_cflags --no_hide_all
add_cflags --dwarf2
# Use armcc as a linker to enable translation of
# some gcc specific options such as -lm and -lpthread.
LD="armcc --translate_gcc"
# Set up linker
add_ldflags --sysv --no_startup --no_ref_cpp_init
add_ldflags --entry=_start
add_ldflags --keep '"*(.init)"' --keep '"*(.fini)"'
add_ldflags --keep '"*(.init_array)"' --keep '"*(.fini_array)"'
add_ldflags --dynamiclinker=/lib/ld-linux.so.3
add_extralibs libc.so.6 -lc_nonshared crt1.o crti.o crtn.o
# create configuration file (uses path to CodeSourcery GCC)
armcc --arm_linux_configure --arm_linux_config_file=arm_linux.cfg
# Add the paths for the alternate libc
for d in usr/include; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_cflags -J"${try_dir}"
done
add_cflags -J"${RVCT31INC}"
for d in lib usr/lib; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_ldflags -L"${try_dir}"
done
# glibc has some struct members named __align, which is a
# storage modifier in RVCT. If we need to use this modifier,
# we'll have to #undef it in our code. Note that this must
# happen AFTER all libc inclues.
add_cflags -D__align=x_align_x
add_cflags --arm_linux_paths --arm_linux_config_file=arm_linux.cfg
add_asflags --no_hide_all --apcs=/interwork
add_ldflags --arm_linux_paths --arm_linux_config_file=arm_linux.cfg
enabled pic && add_cflags --apcs=/fpic
enabled pic && add_asflags --apcs=/fpic
enabled shared && add_cflags --shared
fi
;;
@@ -953,47 +939,23 @@ process_common_toolchain() {
enabled gcov &&
check_add_cflags -fprofile-arcs -ftest-coverage &&
check_add_ldflags -fprofile-arcs -ftest-coverage
if enabled optimizations; then
enabled rvct && check_add_cflags -Otime
enabled small && check_add_cflags -O2 || check_add_cflags -O3
fi
if enabled opencl; then
disable multithread
echo " disabling multithread"
soft_enable opencl #Provide output to make user comfortable
enable runtime_cpu_detect
#Use dlopen() to load OpenCL when possible.
case ${toolchain} in
*darwin10*)
check_add_cflags -D__APPLE__
add_extralibs -framework OpenCL
;;
*-win32-gcc)
if check_header dlfcn.h; then
add_extralibs -ldl
enable dlopen
else
#This shouldn't be a hard-coded path in the long term
add_extralibs -L/cygdrive/c/Windows/System32 -lOpenCL
fi
;;
*)
if check_header dlfcn.h; then
add_extralibs -ldl
enable dlopen
else
add_extralibs -lOpenCL
fi
;;
esac
if enabled rvct; then
enabled small && check_add_cflags -Ospace || check_add_cflags -Otime
else
enabled small && check_add_cflags -O2 || check_add_cflags -O3
fi
fi
# Position Independent Code (PIC) support, for building relocatable
# shared objects
enabled gcc && enabled pic && check_add_cflags -fPIC
# Work around longjmp interception on glibc >= 2.11, to improve binary
# compatibility. See http://code.google.com/p/webm/issues/detail?id=166
enabled linux && check_add_cflags -D_FORTIFY_SOURCE=0
# Check for strip utility variant
${STRIP} -V 2>/dev/null | grep GNU >/dev/null && enable gnu_strip
@@ -1012,6 +974,9 @@ EOF
esac
fi
# for sysconf(3) and friends.
check_header unistd.h
# glibc needs these
if enabled linux; then
add_cflags -D_LARGEFILE_SOURCE

View File

@@ -365,7 +365,7 @@ generate_vcproj() {
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="1"
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="true"
;;
*)
tag Tool \
@@ -379,7 +379,7 @@ generate_vcproj() {
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="1"
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="true"
;;
esac
;;
@@ -447,6 +447,8 @@ generate_vcproj() {
obj_int_extract)
tag Tool \
Name="VCCLCompilerTool" \
Optimization="2" \
FavorSizeorSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
RuntimeLibrary="$release_runtime" \
@@ -462,6 +464,8 @@ generate_vcproj() {
tag Tool \
Name="VCCLCompilerTool" \
Optimization="2" \
FavorSizeorSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
@@ -476,6 +480,8 @@ generate_vcproj() {
tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
Optimization="2" \
FavorSizeorSpeed="1" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \

21
configure vendored
View File

@@ -31,16 +31,17 @@ Advanced options:
${toggle_md5} support for output of checksum data
${toggle_static_msvcrt} use static MSVCRT (VS builds only)
${toggle_vp8} VP8 codec support
${toggle_psnr} output of PSNR data, if supported (encoders)
${toggle_internal_stats} output of encoder internal stats for debug, if supported (encoders)
${toggle_mem_tracker} track memory usage
${toggle_postproc} postprocessing
${toggle_multithread} multithreaded encoding and decoding.
${toggle_spatial_resampling} spatial sampling (scaling) support
${toggle_realtime_only} enable this option while building for real-time encoding
${toggle_error_concealment} enable this option to get a decoder which is able to conceal losses
${toggle_runtime_cpu_detect} runtime cpu detection
${toggle_shared} shared library support
${toggle_static} static library support
${toggle_small} favor smaller size over speed
${toggle_opencl} support for OpenCL-assisted VP8 decoding (experimental)
${toggle_postproc_visualizer} macro block / block level visualizers
Codecs:
@@ -106,7 +107,6 @@ all_platforms="${all_platforms} x86-darwin8-gcc"
all_platforms="${all_platforms} x86-darwin8-icc"
all_platforms="${all_platforms} x86-darwin9-gcc"
all_platforms="${all_platforms} x86-darwin9-icc"
all_platforms="${all_platforms} x86-darwin10-gcc"
all_platforms="${all_platforms} x86-linux-gcc"
all_platforms="${all_platforms} x86-linux-icc"
all_platforms="${all_platforms} x86-solaris-gcc"
@@ -154,6 +154,7 @@ enabled doxygen && php -v >/dev/null 2>&1 && enable install_docs
enable install_bins
enable install_libs
enable static
enable optimizations
enable fast_unaligned #allow unaligned accesses, if supported by hw
enable md5
@@ -213,7 +214,7 @@ HAVE_LIST="
alt_tree_layout
pthread_h
sys_mman_h
dlopen
unistd_h
"
CONFIG_LIST="
external_build
@@ -243,7 +244,7 @@ CONFIG_LIST="
runtime_cpu_detect
postproc
multithread
psnr
internal_stats
${CODECS}
${CODEC_FAMILIES}
encoders
@@ -251,9 +252,10 @@ CONFIG_LIST="
static_msvcrt
spatial_resampling
realtime_only
error_concealment
shared
static
small
opencl
postproc_visualizer
os_support
"
@@ -285,16 +287,17 @@ CMDLINE_SELECT="
dc_recon
postproc
multithread
psnr
internal_stats
${CODECS}
${CODEC_FAMILIES}
static_msvcrt
mem_tracker
spatial_resampling
realtime_only
error_concealment
shared
static
small
opencl
postproc_visualizer
"
@@ -561,6 +564,4 @@ process "$@"
cat <<EOF > ${BUILD_PFX}vpx_config.c
static const char* const cfg = "$CONFIGURE_ARGS";
const char *vpx_codec_build_config(void) {return cfg;}
static const char* const libdir = "$libdir";
const char *vpx_codec_lib_dir(void) {return libdir;}
EOF

View File

@@ -77,6 +77,11 @@ GEN_EXAMPLES-$(CONFIG_ENCODERS) += decode_with_drops.c
endif
decode_with_drops.GUID = CE5C53C4-8DDA-438A-86ED-0DDD3CDB8D26
decode_with_drops.DESCRIPTION = Drops frames while decoding
ifeq ($(CONFIG_DECODERS),yes)
GEN_EXAMPLES-$(CONFIG_ERROR_CONCEALMENT) += decode_with_partial_drops.c
endif
decode_with_partial_drops.GUID = 61C2D026-5754-46AC-916F-1343ECC5537E
decode_with_partial_drops.DESCRIPTION = Drops parts of frames while decoding
GEN_EXAMPLES-$(CONFIG_ENCODERS) += error_resilient.c
error_resilient.GUID = DF5837B9-4145-4F92-A031-44E4F832E00C
error_resilient.DESCRIPTION = Error Resiliency Feature
@@ -122,8 +127,8 @@ else
LIB_PATH := $(call enabled,LIB_PATH)
INC_PATH := $(call enabled,INC_PATH)
endif
CFLAGS += $(addprefix -I,$(INC_PATH))
LDFLAGS += $(addprefix -L,$(LIB_PATH))
INTERNAL_CFLAGS = $(addprefix -I,$(INC_PATH))
INTERNAL_LDFLAGS += $(addprefix -L,$(LIB_PATH))
# Expand list of selected examples to build (as specified above)
@@ -162,8 +167,10 @@ BINS-$(NOT_MSVS) += $(addprefix $(BUILD_PFX),$(ALL_EXAMPLES:.c=))
# Instantiate linker template for all examples.
CODEC_LIB=$(if $(CONFIG_DEBUG_LIBS),vpx_g,vpx)
CODEC_LIB_SUF=$(if $(CONFIG_SHARED),.so,.a)
$(foreach bin,$(BINS-yes),\
$(if $(BUILD_OBJS),$(eval $(bin): $(LIB_PATH)/lib$(CODEC_LIB).a))\
$(if $(BUILD_OBJS),$(eval $(bin):\
$(LIB_PATH)/lib$(CODEC_LIB)$(CODEC_LIB_SUF)))\
$(if $(BUILD_OBJS),$(eval $(call linker_template,$(bin),\
$(call objs,$($(notdir $(bin)).SRCS)) \
-l$(CODEC_LIB) $(addprefix -l,$(CODEC_EXTRA_LIBS))\
@@ -214,7 +221,8 @@ $(1): $($(1:.vcproj=).SRCS)
--ver=$$(CONFIG_VS_VERSION)\
--proj-guid=$$($$(@:.vcproj=).GUID)\
$$(if $$(CONFIG_STATIC_MSVCRT),--static-crt) \
--out=$$@ $$(CFLAGS) $$(LDFLAGS) -l$$(CODEC_LIB) -lwinmm $$^
--out=$$@ $$(INTERNAL_CFLAGS) $$(CFLAGS) \
$$(INTERNAL_LDFLAGS) $$(LDFLAGS) -l$$(CODEC_LIB) -lwinmm $$^
endef
PROJECTS-$(CONFIG_MSVS) += $(ALL_EXAMPLES:.c=.vcproj)
INSTALL-BINS-$(CONFIG_MSVS) += $(foreach p,$(VS_PLATFORMS),\

View File

@@ -0,0 +1,238 @@
@TEMPLATE decoder_tmpl.c
Decode With Partial Drops Example
=========================
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
This is an example utility which drops a series of frames (or parts of frames),
as specified on the command line. This is useful for observing the error
recovery features of the codec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
#include <time.h>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
struct parsed_header
{
char key_frame;
int version;
char show_frame;
int first_part_size;
};
int next_packet(struct parsed_header* hdr, int pos, int length, int mtu)
{
int size = 0;
int remaining = length - pos;
/* Uncompressed part is 3 bytes for P frames and 10 bytes for I frames */
int uncomp_part_size = (hdr->key_frame ? 10 : 3);
/* number of bytes yet to send from header and the first partition */
int remainFirst = uncomp_part_size + hdr->first_part_size - pos;
if (remainFirst > 0)
{
if (remainFirst <= mtu)
{
size = remainFirst;
}
else
{
size = mtu;
}
return size;
}
/* second partition; just slot it up according to MTU */
if (remaining <= mtu)
{
size = remaining;
return size;
}
return mtu;
}
void throw_packets(unsigned char* frame, int* size, int loss_rate,
int* thrown, int* kept)
{
unsigned char loss_frame[256*1024];
int pkg_size = 1;
int pos = 0;
int loss_pos = 0;
struct parsed_header hdr;
unsigned int tmp;
int mtu = 1500;
if (*size < 3)
{
return;
}
putc('|', stdout);
/* parse uncompressed 3 bytes */
tmp = (frame[2] << 16) | (frame[1] << 8) | frame[0];
hdr.key_frame = !(tmp & 0x1); /* inverse logic */
hdr.version = (tmp >> 1) & 0x7;
hdr.show_frame = (tmp >> 4) & 0x1;
hdr.first_part_size = (tmp >> 5) & 0x7FFFF;
/* don't drop key frames */
if (hdr.key_frame)
{
int i;
*kept = *size/mtu + ((*size % mtu > 0) ? 1 : 0); /* approximate */
for (i=0; i < *kept; i++)
putc('.', stdout);
return;
}
while ((pkg_size = next_packet(&hdr, pos, *size, mtu)) > 0)
{
int loss_event = ((rand() + 1.0)/(RAND_MAX + 1.0) < loss_rate/100.0);
if (*thrown == 0 && !loss_event)
{
memcpy(loss_frame + loss_pos, frame + pos, pkg_size);
loss_pos += pkg_size;
(*kept)++;
putc('.', stdout);
}
else
{
(*thrown)++;
putc('X', stdout);
}
pos += pkg_size;
}
memcpy(frame, loss_frame, loss_pos);
memset(frame + loss_pos, 0, *size - loss_pos);
*size = loss_pos;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INIT
/* Initialize codec */
flags = VPX_CODEC_USE_ERROR_CONCEALMENT;
res = vpx_codec_dec_init(&codec, interface, &dec_cfg, flags);
if(res)
die_codec(&codec, "Failed to initialize decoder");
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INIT
Usage
-----
This example adds a single argument to the `simple_decoder` example,
which specifies the range or pattern of frames to drop. The parameter is
parsed as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
if(argc < 4 || argc > 6)
die("Usage: %s <infile> <outfile> [-t <num threads>] <N-M|N/M|L,S>\n",
argv[0]);
{
char *nptr;
int arg_num = 3;
if (argc == 6 && strncmp(argv[arg_num++], "-t", 2) == 0)
dec_cfg.threads = strtol(argv[arg_num++], NULL, 0);
n = strtol(argv[arg_num], &nptr, 0);
mode = (*nptr == '\0' || *nptr == ',') ? 2 : (*nptr == '-') ? 1 : 0;
m = strtol(nptr+1, NULL, 0);
if((!n && !m) || (*nptr != '-' && *nptr != '/' &&
*nptr != '\0' && *nptr != ','))
die("Couldn't parse pattern %s\n", argv[3]);
}
seed = (m > 0) ? m : (unsigned int)time(NULL);
srand(seed);thrown_frame = 0;
printf("Seed: %u\n", seed);
printf("Threads: %d\n", dec_cfg.threads);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
Dropping A Range Of Frames
--------------------------
To drop a range of frames, specify the starting frame and the ending
frame to drop, separated by a dash. The following command will drop
frames 5 through 10 (base 1).
$ ./decode_with_partial_drops in.ivf out.i420 5-10
Dropping A Pattern Of Frames
----------------------------
To drop a pattern of frames, specify the number of frames to drop and
the number of frames after which to repeat the pattern, separated by
a forward-slash. The following command will drop 3 of 7 frames.
Specifically, it will decode 4 frames, then drop 3 frames, and then
repeat.
$ ./decode_with_partial_drops in.ivf out.i420 3/7
Dropping Random Parts Of Frames
-------------------------------
A third argument tuple is available to split the frame into 1500 bytes pieces
and randomly drop pieces rather than frames. The frame will be split at
partition boundaries where possible. The following example will seed the RNG
with the seed 123 and drop approximately 5% of the pieces. Pieces which
are depending on an already dropped piece will also be dropped.
$ ./decode_with_partial_drops in.ivf out.i420 5,123
Extra Variables
---------------
This example maintains the pattern passed on the command line in the
`n`, `m`, and `is_range` variables:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
int n, m, mode;
unsigned int seed;
int thrown=0, kept=0;
int thrown_frame=0, kept_frame=0;
vpx_codec_dec_cfg_t dec_cfg = {0};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
Making The Drop Decision
------------------------
The example decides whether to drop the frame based on the current
frame number, immediately before decoding the frame.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE
/* Decide whether to throw parts of the frame or the whole frame
depending on the drop mode */
thrown_frame = 0;
kept_frame = 0;
switch (mode)
{
case 0:
if (m - (frame_cnt-1)%m <= n)
{
frame_sz = 0;
}
break;
case 1:
if (frame_cnt >= n && frame_cnt <= m)
{
frame_sz = 0;
}
break;
case 2:
throw_packets(frame, &frame_sz, n, &thrown_frame, &kept_frame);
break;
default: break;
}
if (mode < 2)
{
if (frame_sz == 0)
{
putc('X', stdout);
thrown_frame++;
}
else
{
putc('.', stdout);
kept_frame++;
}
}
thrown += thrown_frame;
kept += kept_frame;
fflush(stdout);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE

View File

@@ -42,6 +42,8 @@ static void die(const char *fmt, ...) {
@DIE_CODEC
@HELPERS
int main(int argc, char **argv) {
FILE *infile, *outfile;
vpx_codec_ctx_t codec;

View File

@@ -111,8 +111,6 @@ int main(int argc, char **argv) {
vpx_codec_ctx_t codec;
vpx_codec_enc_cfg_t cfg;
int frame_cnt = 0;
unsigned char file_hdr[IVF_FILE_HDR_SZ];
unsigned char frame_hdr[IVF_FRAME_HDR_SZ];
vpx_image_t raw;
vpx_codec_err_t res;
long width;

View File

@@ -21,7 +21,7 @@ res = vpx_codec_dec_init(&codec, interface, NULL,
if(res == VPX_CODEC_INCAPABLE) {
printf("NOTICE: Postproc not supported by %s\n",
vpx_codec_iface_name(interface));
res = vpx_codec_dec_init(&codec, interface, NULL, 0);
res = vpx_codec_dec_init(&codec, interface, NULL, flags);
}
if(res)
die_codec(&codec, "Failed to initialize decoder");

View File

@@ -120,7 +120,7 @@ enum mkv
//video
Video = 0xE0,
FlagInterlaced = 0x9A,
// StereoMode = 0x53B8,
StereoMode = 0x53B8,
PixelWidth = 0xB0,
PixelHeight = 0xBA,
PixelCropBottom = 0x54AA,

View File

@@ -11,6 +11,7 @@
#include <stdlib.h>
#include <wchar.h>
#include <string.h>
#include <limits.h>
#if defined(_MSC_VER)
#define LITERALU64(n) n
#else
@@ -33,7 +34,7 @@ void Ebml_WriteLen(EbmlGlobal *glob, long long val)
val |= (LITERALU64(0x000000000000080) << ((size - 1) * 7));
Ebml_Serialize(glob, (void *) &val, size);
Ebml_Serialize(glob, (void *) &val, sizeof(val), size);
}
void Ebml_WriteString(EbmlGlobal *glob, const char *str)
@@ -60,21 +61,26 @@ void Ebml_WriteUTF8(EbmlGlobal *glob, const wchar_t *wstr)
void Ebml_WriteID(EbmlGlobal *glob, unsigned long class_id)
{
int len;
if (class_id >= 0x01000000)
Ebml_Serialize(glob, (void *)&class_id, 4);
len = 4;
else if (class_id >= 0x00010000)
Ebml_Serialize(glob, (void *)&class_id, 3);
len = 3;
else if (class_id >= 0x00000100)
Ebml_Serialize(glob, (void *)&class_id, 2);
len = 2;
else
Ebml_Serialize(glob, (void *)&class_id, 1);
len = 1;
Ebml_Serialize(glob, (void *)&class_id, sizeof(class_id), len);
}
void Ebml_SerializeUnsigned64(EbmlGlobal *glob, unsigned long class_id, uint64_t ui)
{
unsigned char sizeSerialized = 8 | 0x80;
Ebml_WriteID(glob, class_id);
Ebml_Serialize(glob, &sizeSerialized, 1);
Ebml_Serialize(glob, &ui, 8);
Ebml_Serialize(glob, &sizeSerialized, sizeof(sizeSerialized), 1);
Ebml_Serialize(glob, &ui, sizeof(ui), 8);
}
void Ebml_SerializeUnsigned(EbmlGlobal *glob, unsigned long class_id, unsigned long ui)
@@ -97,8 +103,8 @@ void Ebml_SerializeUnsigned(EbmlGlobal *glob, unsigned long class_id, unsigned l
}
sizeSerialized = 0x80 | size;
Ebml_Serialize(glob, &sizeSerialized, 1);
Ebml_Serialize(glob, &ui, size);
Ebml_Serialize(glob, &sizeSerialized, sizeof(sizeSerialized), 1);
Ebml_Serialize(glob, &ui, sizeof(ui), size);
}
//TODO: perhaps this is a poor name for this id serializer helper function
void Ebml_SerializeBinary(EbmlGlobal *glob, unsigned long class_id, unsigned long bin)
@@ -119,14 +125,14 @@ void Ebml_SerializeFloat(EbmlGlobal *glob, unsigned long class_id, double d)
unsigned char len = 0x88;
Ebml_WriteID(glob, class_id);
Ebml_Serialize(glob, &len, 1);
Ebml_Serialize(glob, &d, 8);
Ebml_Serialize(glob, &len, sizeof(len), 1);
Ebml_Serialize(glob, &d, sizeof(d), 8);
}
void Ebml_WriteSigned16(EbmlGlobal *glob, short val)
{
signed long out = ((val & 0x003FFFFF) | 0x00200000) << 8;
Ebml_Serialize(glob, &out, 3);
Ebml_Serialize(glob, &out, sizeof(out), 3);
}
void Ebml_SerializeString(EbmlGlobal *glob, unsigned long class_id, const char *s)
@@ -143,7 +149,6 @@ void Ebml_SerializeUTF8(EbmlGlobal *glob, unsigned long class_id, wchar_t *s)
void Ebml_SerializeData(EbmlGlobal *glob, unsigned long class_id, unsigned char *data, unsigned long data_length)
{
unsigned char size = 4;
Ebml_WriteID(glob, class_id);
Ebml_WriteLen(glob, data_length);
Ebml_Write(glob, data, data_length);

View File

@@ -15,7 +15,7 @@
#include "vpx/vpx_integer.h"
typedef struct EbmlGlobal EbmlGlobal;
void Ebml_Serialize(EbmlGlobal *glob, const void *, unsigned long);
void Ebml_Serialize(EbmlGlobal *glob, const void *, int, unsigned long);
void Ebml_Write(EbmlGlobal *glob, const void *, unsigned long);
/////

View File

@@ -35,11 +35,11 @@ void writeSimpleBlock(EbmlGlobal *glob, unsigned char trackNumber, short timeCod
Ebml_WriteID(glob, SimpleBlock);
unsigned long blockLength = 4 + dataLength;
blockLength |= 0x10000000; //TODO check length < 0x0FFFFFFFF
Ebml_Serialize(glob, &blockLength, 4);
Ebml_Serialize(glob, &blockLength, sizeof(blockLength), 4);
trackNumber |= 0x80; //TODO check track nubmer < 128
Ebml_Write(glob, &trackNumber, 1);
//Ebml_WriteSigned16(glob, timeCode,2); //this is 3 bytes
Ebml_Serialize(glob, &timeCode, 2);
Ebml_Serialize(glob, &timeCode, sizeof(timeCode), 2);
unsigned char flags = 0x00 | (isKeyframe ? 0x80 : 0x00) | (lacingFlag << 1) | discardable;
Ebml_Write(glob, &flags, 1);
Ebml_Write(glob, data, dataLength);

89
libs.mk
View File

@@ -35,6 +35,7 @@ ifeq ($(CONFIG_VP8_ENCODER),yes)
CODEC_SRCS-yes += $(addprefix $(VP8_PREFIX),$(call enabled,VP8_CX_SRCS))
CODEC_EXPORTS-yes += $(addprefix $(VP8_PREFIX),$(VP8_CX_EXPORTS))
CODEC_SRCS-yes += $(VP8_PREFIX)vp8cx.mk vpx/vp8.h vpx/vp8cx.h vpx/vp8e.h
CODEC_SRCS-$(ARCH_ARM) += $(VP8_PREFIX)vp8cx_arm.mk
INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8e.h include/vpx/vp8cx.h
INSTALL_MAPS += include/vpx/% $(SRC_PATH_BARE)/$(VP8_PREFIX)/%
CODEC_DOC_SRCS += vpx/vp8.h vpx/vp8cx.h
@@ -47,6 +48,7 @@ ifeq ($(CONFIG_VP8_DECODER),yes)
CODEC_SRCS-yes += $(addprefix $(VP8_PREFIX),$(call enabled,VP8_DX_SRCS))
CODEC_EXPORTS-yes += $(addprefix $(VP8_PREFIX),$(VP8_DX_EXPORTS))
CODEC_SRCS-yes += $(VP8_PREFIX)vp8dx.mk vpx/vp8.h vpx/vp8dx.h
CODEC_SRCS-$(ARCH_ARM) += $(VP8_PREFIX)vp8dx_arm.mk
INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8dx.h
INSTALL_MAPS += include/vpx/% $(SRC_PATH_BARE)/$(VP8_PREFIX)/%
CODEC_DOC_SRCS += vpx/vp8.h vpx/vp8dx.h
@@ -89,6 +91,7 @@ $(eval $(if $(filter universal%,$(TOOLCHAIN)),LIPO_LIBVPX,BUILD_LIBVPX):=yes)
CODEC_SRCS-$(BUILD_LIBVPX) += build/make/version.sh
CODEC_SRCS-$(BUILD_LIBVPX) += vpx/vpx_integer.h
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/asm_offsets.h
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/vpx_timer.h
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/mem.h
CODEC_SRCS-$(BUILD_LIBVPX) += $(BUILD_PFX)vpx_config.c
@@ -100,7 +103,7 @@ CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/x86_abi_support.asm
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/x86_cpuid.c
endif
CODEC_SRCS-$(ARCH_ARM) += vpx_ports/arm_cpudetect.c
CODEC_SRCS-$(ARCH_ARM) += $(BUILD_PFX)vpx_config.asm
CODEC_SRCS-$(ARCH_ARM) += vpx_ports/arm.h
CODEC_EXPORTS-$(BUILD_LIBVPX) += vpx/exports_com
CODEC_EXPORTS-$(CONFIG_ENCODERS) += vpx/exports_enc
CODEC_EXPORTS-$(CONFIG_DECODERS) += vpx/exports_dec
@@ -121,20 +124,8 @@ INSTALL-LIBS-$(CONFIG_SHARED) += $(foreach p,$(VS_PLATFORMS),$(LIBSUBDIR)/$(p)/v
INSTALL-LIBS-$(CONFIG_SHARED) += $(foreach p,$(VS_PLATFORMS),$(LIBSUBDIR)/$(p)/vpx.exp)
endif
else
INSTALL-LIBS-yes += $(LIBSUBDIR)/libvpx.a
INSTALL-LIBS-$(CONFIG_STATIC) += $(LIBSUBDIR)/libvpx.a
INSTALL-LIBS-$(CONFIG_DEBUG_LIBS) += $(LIBSUBDIR)/libvpx_g.a
#Install the OpenCL kernels if CL enabled.
ifeq ($(CONFIG_OPENCL),yes)
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/filter_cl.cl
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/idctllm_cl.cl
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/common/opencl/loopfilter.cl
#only install decoder CL files if VP8 decoder enabled
ifeq ($(CONFIG_VP8_DECODER),yes)
INSTALL-LIBS-yes += $(LIBSUBDIR)/vp8/decoder/opencl/dequantize_cl.cl
endif
endif #CONFIG_OPENCL=yes
endif
CODEC_SRCS=$(call enabled,CODEC_SRCS)
@@ -189,14 +180,15 @@ endif
else
LIBVPX_OBJS=$(call objs,$(CODEC_SRCS))
OBJS-$(BUILD_LIBVPX) += $(LIBVPX_OBJS)
LIBS-$(BUILD_LIBVPX) += $(BUILD_PFX)libvpx.a $(BUILD_PFX)libvpx_g.a
LIBS-$(if $(BUILD_LIBVPX),$(CONFIG_STATIC)) += $(BUILD_PFX)libvpx.a $(BUILD_PFX)libvpx_g.a
$(BUILD_PFX)libvpx_g.a: $(LIBVPX_OBJS)
BUILD_LIBVPX_SO := $(if $(BUILD_LIBVPX),$(CONFIG_SHARED))
LIBVPX_SO := libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)
LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)
LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)\
$(notdir $(LIBVPX_SO_SYMLINKS))
$(BUILD_PFX)$(LIBVPX_SO): $(LIBVPX_OBJS) libvpx.ver
$(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm -pthread
$(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm
$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(VERSION_MAJOR)
$(BUILD_PFX)$(LIBVPX_SO): SO_VERSION_SCRIPT = libvpx.ver
LIBVPX_SO_SYMLINKS := $(addprefix $(LIBSUBDIR)/, \
@@ -210,9 +202,18 @@ libvpx.ver: $(call enabled,CODEC_EXPORTS)
$(qexec)echo "local: *; };" >> $@
CLEAN-OBJS += libvpx.ver
$(addprefix $(DIST_DIR)/,$(LIBVPX_SO_SYMLINKS)):
@echo " [LN] $@"
$(qexec)ln -sf $(LIBVPX_SO) $@
define libvpx_symlink_template
$(1): $(2)
@echo " [LN] $$@"
$(qexec)ln -sf $(LIBVPX_SO) $$@
endef
$(eval $(call libvpx_symlink_template,\
$(addprefix $(BUILD_PFX),$(notdir $(LIBVPX_SO_SYMLINKS))),\
$(BUILD_PFX)$(LIBVPX_SO)))
$(eval $(call libvpx_symlink_template,\
$(addprefix $(DIST_DIR)/,$(LIBVPX_SO_SYMLINKS)),\
$(DIST_DIR)/$(LIBSUBDIR)/$(LIBVPX_SO)))
INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBVPX_SO_SYMLINKS)
INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBSUBDIR)/$(LIBVPX_SO)
@@ -269,36 +270,44 @@ $(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
#
# Calculate platform- and compiler-specific offsets for hand coded assembly
#
ifeq ($(CONFIG_EXTERNAL_BUILD),) # Visual Studio uses obj_int_extract.bat
ifeq ($(ARCH_ARM), yes)
ifeq ($(filter icc gcc,$(TGT_CC)), $(TGT_CC))
$(BUILD_PFX)asm_com_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S
grep EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
$(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S: $(VP8_PREFIX)common/asm_com_offsets.c
CLEAN-OBJS += $(BUILD_PFX)asm_com_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S
$(BUILD_PFX)asm_enc_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S
grep EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
$(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S: $(VP8_PREFIX)encoder/asm_enc_offsets.c
CLEAN-OBJS += $(BUILD_PFX)asm_enc_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S
$(BUILD_PFX)asm_dec_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S
grep EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
$(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S: $(VP8_PREFIX)decoder/asm_dec_offsets.c
CLEAN-OBJS += $(BUILD_PFX)asm_dec_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S
else
ifeq ($(filter rvct,$(TGT_CC)), $(TGT_CC))
asm_com_offsets.asm: obj_int_extract
asm_com_offsets.asm: $(VP8_PREFIX)common/asm_com_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)common/asm_com_offsets.c.o
CLEAN-OBJS += asm_com_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_com_offsets.asm
endif
ifeq ($(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64), yes)
ifeq ($(CONFIG_VP8_ENCODER), yes)
asm_enc_offsets.asm: obj_int_extract
asm_enc_offsets.asm: $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
asm_enc_offsets.asm: obj_int_extract
asm_enc_offsets.asm: $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
CLEAN-OBJS += asm_enc_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_enc_offsets.asm
endif
endif
OBJS-yes += $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
CLEAN-OBJS += asm_enc_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_enc_offsets.asm
ifeq ($(ARCH_ARM), yes)
ifeq ($(CONFIG_VP8_DECODER), yes)
asm_dec_offsets.asm: obj_int_extract
asm_dec_offsets.asm: $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
asm_dec_offsets.asm: obj_int_extract
asm_dec_offsets.asm: $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
CLEAN-OBJS += asm_dec_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_dec_offsets.asm
endif
OBJS-yes += $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
CLEAN-OBJS += asm_dec_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_dec_offsets.asm
endif
endif

View File

@@ -27,6 +27,9 @@ static void update_mode_info_border(MODE_INFO *mi, int rows, int cols)
for (i = 0; i < rows; i++)
{
/* TODO(holmer): Bug? This updates the last element of each row
* rather than the border element!
*/
vpx_memset(&mi[i*cols-1], 0, sizeof(MODE_INFO));
}
}
@@ -43,9 +46,11 @@ void vp8_de_alloc_frame_buffers(VP8_COMMON *oci)
vpx_free(oci->above_context);
vpx_free(oci->mip);
vpx_free(oci->prev_mip);
oci->above_context = 0;
oci->mip = 0;
oci->prev_mip = 0;
}
@@ -65,9 +70,9 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
for (i = 0; i < NUM_YV12_BUFFERS; i++)
{
oci->fb_idx_ref_cnt[0] = 0;
if (vp8_yv12_alloc_frame_buffer(&oci->yv12_fb[i], width, height, VP8BORDERINPIXELS) < 0)
oci->fb_idx_ref_cnt[i] = 0;
oci->yv12_fb[i].flags = 0;
if (vp8_yv12_alloc_frame_buffer(&oci->yv12_fb[i], width, height, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
@@ -110,6 +115,21 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
oci->mi = oci->mip + oci->mode_info_stride + 1;
/* allocate memory for last frame MODE_INFO array */
#if CONFIG_ERROR_CONCEALMENT
oci->prev_mip = vpx_calloc((oci->mb_cols + 1) * (oci->mb_rows + 1), sizeof(MODE_INFO));
if (!oci->prev_mip)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
}
oci->prev_mi = oci->prev_mip + oci->mode_info_stride + 1;
#else
oci->prev_mip = NULL;
oci->prev_mi = NULL;
#endif
oci->above_context = vpx_calloc(sizeof(ENTROPY_CONTEXT_PLANES) * oci->mb_cols, 1);
@@ -120,6 +140,9 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
}
update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);
#if CONFIG_ERROR_CONCEALMENT
update_mode_info_border(oci->prev_mi, oci->mb_rows, oci->mb_cols);
#endif
return 0;
}
@@ -129,33 +152,33 @@ void vp8_setup_version(VP8_COMMON *cm)
{
case 0:
cm->no_lpf = 0;
cm->simpler_lpf = 0;
cm->mcomp_filter_type = SIXTAP;
cm->filter_type = NORMAL_LOOPFILTER;
cm->use_bilinear_mc_filter = 0;
cm->full_pixel = 0;
break;
case 1:
cm->no_lpf = 0;
cm->simpler_lpf = 1;
cm->mcomp_filter_type = BILINEAR;
cm->filter_type = SIMPLE_LOOPFILTER;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 0;
break;
case 2:
cm->no_lpf = 1;
cm->simpler_lpf = 0;
cm->mcomp_filter_type = BILINEAR;
cm->filter_type = NORMAL_LOOPFILTER;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 0;
break;
case 3:
cm->no_lpf = 1;
cm->simpler_lpf = 1;
cm->mcomp_filter_type = BILINEAR;
cm->filter_type = SIMPLE_LOOPFILTER;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 1;
break;
default:
/*4,5,6,7 are reserved for future use*/
cm->no_lpf = 0;
cm->simpler_lpf = 0;
cm->mcomp_filter_type = SIXTAP;
cm->filter_type = NORMAL_LOOPFILTER;
cm->use_bilinear_mc_filter = 0;
cm->full_pixel = 0;
break;
}
@@ -169,8 +192,8 @@ void vp8_create_common(VP8_COMMON *oci)
oci->mb_no_coeff_skip = 1;
oci->no_lpf = 0;
oci->simpler_lpf = 0;
oci->mcomp_filter_type = SIXTAP;
oci->filter_type = NORMAL_LOOPFILTER;
oci->use_bilinear_mc_filter = 0;
oci->full_pixel = 0;
oci->multi_token_partition = ONE_PARTITION;
oci->clr_type = REG_YUV;

View File

@@ -24,14 +24,17 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
#if CONFIG_RUNTIME_CPU_DETECT
VP8_COMMON_RTCD *rtcd = &ctx->rtcd;
int flags = arm_cpu_caps();
int has_edsp = flags & HAS_EDSP;
int has_media = flags & HAS_MEDIA;
int has_neon = flags & HAS_NEON;
rtcd->flags = flags;
/* Override default functions with fastest ones for this CPU. */
#if HAVE_ARMV5TE
if (flags & HAS_EDSP)
{
}
#endif
#if HAVE_ARMV6
if (has_media)
if (flags & HAS_MEDIA)
{
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_armv6;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_armv6;
@@ -51,9 +54,11 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
rtcd->loopfilter.normal_b_v = vp8_loop_filter_bv_armv6;
rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_armv6;
rtcd->loopfilter.normal_b_h = vp8_loop_filter_bh_armv6;
rtcd->loopfilter.simple_mb_v = vp8_loop_filter_mbvs_armv6;
rtcd->loopfilter.simple_mb_v =
vp8_loop_filter_simple_vertical_edge_armv6;
rtcd->loopfilter.simple_b_v = vp8_loop_filter_bvs_armv6;
rtcd->loopfilter.simple_mb_h = vp8_loop_filter_mbhs_armv6;
rtcd->loopfilter.simple_mb_h =
vp8_loop_filter_simple_horizontal_edge_armv6;
rtcd->loopfilter.simple_b_h = vp8_loop_filter_bhs_armv6;
rtcd->recon.copy16x16 = vp8_copy_mem16x16_v6;
@@ -66,7 +71,7 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
#endif
#if HAVE_ARMV7
if (has_neon)
if (flags & HAS_NEON)
{
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_neon;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_neon;

View File

@@ -30,11 +30,11 @@
ldr r4, [sp, #36] ; width
mov r12, r3 ; outer-loop counter
sub r2, r2, r4 ; src increment for height loop
;;IF ARCHITECTURE=6
pld [r0]
;;ENDIF
add r7, r2, r4 ; preload next row
pld [r0, r7]
sub r2, r2, r4 ; src increment for height loop
ldr r5, [r11] ; load up filter coefficients
@@ -96,9 +96,8 @@
add r0, r0, r2 ; move to next input row
subs r12, r12, #1
;;IF ARCHITECTURE=6
pld [r0]
;;ENDIF
add r9, r2, r4, lsl #1 ; adding back block width
pld [r0, r9] ; preload next row
add r11, r11, #2 ; move over to next column
mov r1, r11

View File

@@ -22,9 +22,7 @@
;push {r4-r7}
;preload
pld [r0]
pld [r0, r1]
pld [r0, r1, lsl #1]
pld [r0, #31] ; preload for next 16x16 block
ands r4, r0, #15
beq copy_mem16x16_fast
@@ -90,6 +88,8 @@ copy_mem16x16_1_loop
ldrneb r6, [r0, #2]
ldrneb r7, [r0, #3]
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_1_loop
ldmia sp!, {r4 - r7}
@@ -121,6 +121,8 @@ copy_mem16x16_4_loop
ldrne r6, [r0, #8]
ldrne r7, [r0, #12]
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_4_loop
ldmia sp!, {r4 - r7}
@@ -148,6 +150,7 @@ copy_mem16x16_8_loop
add r2, r2, r3
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_8_loop
ldmia sp!, {r4 - r7}
@@ -171,6 +174,7 @@ copy_mem16x16_fast_loop
;stm r2, {r4-r7}
add r2, r2, r3
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_fast_loop
ldmia sp!, {r4 - r7}

View File

@@ -10,6 +10,8 @@
EXPORT |vp8_filter_block2d_first_pass_armv6|
EXPORT |vp8_filter_block2d_first_pass_16x16_armv6|
EXPORT |vp8_filter_block2d_first_pass_8x8_armv6|
EXPORT |vp8_filter_block2d_second_pass_armv6|
EXPORT |vp8_filter4_block2d_second_pass_armv6|
EXPORT |vp8_filter_block2d_first_pass_only_armv6|
@@ -40,11 +42,6 @@
add r12, r3, #16 ; square off the output
sub sp, sp, #4
;;IF ARCHITECTURE=6
;pld [r0, #-2]
;;pld [r0, #30]
;;ENDIF
ldr r4, [r11] ; load up packed filter coefficients
ldr r5, [r11, #4]
ldr r6, [r11, #8]
@@ -101,15 +98,10 @@
bne width_loop_1st_6
;;add r9, r2, #30 ; attempt to load 2 adjacent cache lines
;;IF ARCHITECTURE=6
;pld [r0, r2]
;;pld [r0, r9]
;;ENDIF
ldr r1, [sp] ; load and update dst address
subs r7, r7, #0x10000
add r0, r0, r2 ; move to next input line
add r1, r1, #2 ; move over to next column
str r1, [sp]
@@ -120,6 +112,192 @@
ENDP
; --------------------------
; 16x16 version
; -----------------------------
|vp8_filter_block2d_first_pass_16x16_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r7, [sp, #36] ; output height
add r4, r2, #18 ; preload next low
pld [r0, r4]
sub r2, r2, r3 ; inside loop increments input array,
; so the height loop only needs to add
; r2 - width to the input pointer
mov r3, r3, lsl #1 ; multiply width by 2 because using shorts
add r12, r3, #16 ; square off the output
sub sp, sp, #4
ldr r4, [r11] ; load up packed filter coefficients
ldr r5, [r11, #4]
ldr r6, [r11, #8]
str r1, [sp] ; push destination to stack
mov r7, r7, lsl #16 ; height is top part of counter
; six tap filter
|height_loop_1st_16_6|
ldrb r8, [r0, #-2] ; load source data
ldrb r9, [r0, #-1]
ldrb r10, [r0], #2
orr r7, r7, r3, lsr #2 ; construct loop counter
|width_loop_1st_16_6|
ldrb r11, [r0, #-1]
pkhbt lr, r8, r9, lsl #16 ; r9 | r8
pkhbt r8, r9, r10, lsl #16 ; r10 | r9
ldrb r9, [r0]
smuad lr, lr, r4 ; apply the filter
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smuad r8, r8, r4
pkhbt r11, r11, r9, lsl #16 ; r9 | r11
smlad lr, r10, r5, lr
ldrb r10, [r0, #1]
smlad r8, r11, r5, r8
ldrb r11, [r0, #2]
sub r7, r7, #1
pkhbt r9, r9, r10, lsl #16 ; r10 | r9
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smlad lr, r9, r6, lr
smlad r11, r10, r6, r8
ands r10, r7, #0xff ; test loop counter
add lr, lr, #0x40 ; round_shift_and_clamp
ldrneb r8, [r0, #-2] ; load data for next loop
usat lr, #8, lr, asr #7
add r11, r11, #0x40
ldrneb r9, [r0, #-1]
usat r11, #8, r11, asr #7
strh lr, [r1], r12 ; result is transposed and stored, which
; will make second pass filtering easier.
ldrneb r10, [r0], #2
strh r11, [r1], r12
bne width_loop_1st_16_6
ldr r1, [sp] ; load and update dst address
subs r7, r7, #0x10000
add r0, r0, r2 ; move to next input line
add r11, r2, #34 ; adding back block width(=16)
pld [r0, r11] ; preload next low
add r1, r1, #2 ; move over to next column
str r1, [sp]
bne height_loop_1st_16_6
add sp, sp, #4
ldmia sp!, {r4 - r11, pc}
ENDP
; --------------------------
; 8x8 version
; -----------------------------
|vp8_filter_block2d_first_pass_8x8_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r7, [sp, #36] ; output height
add r4, r2, #10 ; preload next low
pld [r0, r4]
sub r2, r2, r3 ; inside loop increments input array,
; so the height loop only needs to add
; r2 - width to the input pointer
mov r3, r3, lsl #1 ; multiply width by 2 because using shorts
add r12, r3, #16 ; square off the output
sub sp, sp, #4
ldr r4, [r11] ; load up packed filter coefficients
ldr r5, [r11, #4]
ldr r6, [r11, #8]
str r1, [sp] ; push destination to stack
mov r7, r7, lsl #16 ; height is top part of counter
; six tap filter
|height_loop_1st_8_6|
ldrb r8, [r0, #-2] ; load source data
ldrb r9, [r0, #-1]
ldrb r10, [r0], #2
orr r7, r7, r3, lsr #2 ; construct loop counter
|width_loop_1st_8_6|
ldrb r11, [r0, #-1]
pkhbt lr, r8, r9, lsl #16 ; r9 | r8
pkhbt r8, r9, r10, lsl #16 ; r10 | r9
ldrb r9, [r0]
smuad lr, lr, r4 ; apply the filter
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smuad r8, r8, r4
pkhbt r11, r11, r9, lsl #16 ; r9 | r11
smlad lr, r10, r5, lr
ldrb r10, [r0, #1]
smlad r8, r11, r5, r8
ldrb r11, [r0, #2]
sub r7, r7, #1
pkhbt r9, r9, r10, lsl #16 ; r10 | r9
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smlad lr, r9, r6, lr
smlad r11, r10, r6, r8
ands r10, r7, #0xff ; test loop counter
add lr, lr, #0x40 ; round_shift_and_clamp
ldrneb r8, [r0, #-2] ; load data for next loop
usat lr, #8, lr, asr #7
add r11, r11, #0x40
ldrneb r9, [r0, #-1]
usat r11, #8, r11, asr #7
strh lr, [r1], r12 ; result is transposed and stored, which
; will make second pass filtering easier.
ldrneb r10, [r0], #2
strh r11, [r1], r12
bne width_loop_1st_8_6
ldr r1, [sp] ; load and update dst address
subs r7, r7, #0x10000
add r0, r0, r2 ; move to next input line
add r11, r2, #18 ; adding back block width(=8)
pld [r0, r11] ; preload next low
add r1, r1, #2 ; move over to next column
str r1, [sp]
bne height_loop_1st_8_6
add sp, sp, #4
ldmia sp!, {r4 - r11, pc}
ENDP
;---------------------------------
; r0 short *src_ptr,
; r1 unsigned char *output_ptr,
@@ -262,6 +440,10 @@
|vp8_filter_block2d_first_pass_only_armv6| PROC
stmdb sp!, {r4 - r11, lr}
add r7, r2, r3 ; preload next low
add r7, r7, #2
pld [r0, r7]
ldr r4, [sp, #36] ; output pitch
ldr r11, [sp, #40] ; HFilter address
sub sp, sp, #8
@@ -330,16 +512,15 @@
bne width_loop_1st_only_6
;;add r9, r2, #30 ; attempt to load 2 adjacent cache lines
;;IF ARCHITECTURE=6
;pld [r0, r2]
;;pld [r0, r9]
;;ENDIF
ldr lr, [sp] ; load back output pitch
ldr r12, [sp, #4] ; load back output pitch
subs r7, r7, #1
add r0, r0, r12 ; updata src for next loop
add r11, r12, r3 ; preload next low
add r11, r11, #2
pld [r0, r11]
add r1, r1, lr ; update dst for next loop
bne height_loop_1st_only_6

View File

@@ -53,14 +53,11 @@ count RN r5
;r0 unsigned char *src_ptr,
;r1 int src_pixel_step,
;r2 const char *flimit,
;r2 const char *blimit,
;r3 const char *limit,
;stack const char *thresh,
;stack int count
;Note: All 16 elements in flimit are equal. So, in the code, only one load is needed
;for flimit. Same way applies to limit and thresh.
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
|vp8_loop_filter_horizontal_edge_armv6| PROC
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
@@ -72,14 +69,18 @@ count RN r5
sub sp, sp, #16 ; create temp buffer
ldr r9, [src], pstep ; p3
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
ldr r10, [src], pstep ; p2
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
ldr r11, [src], pstep ; p1
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r6], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r6] ; thresh
orr r2, r2, r2, lsl #8
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|Hnext8|
; vp8_filter_mask() function
@@ -253,12 +254,6 @@ count RN r5
subs count, count, #1
;pld [src]
;pld [src, pstep]
;pld [src, pstep, lsl #1]
;pld [src, pstep, lsl #2]
;pld [src, pstep, lsl #3]
ldrne r9, [src], pstep ; p3
ldrne r10, [src], pstep ; p2
ldrne r11, [src], pstep ; p1
@@ -281,14 +276,18 @@ count RN r5
sub sp, sp, #16 ; create temp buffer
ldr r9, [src], pstep ; p3
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
ldr r10, [src], pstep ; p2
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
ldr r11, [src], pstep ; p1
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r6], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r6] ; thresh
orr r2, r2, r2, lsl #8
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|MBHnext8|
@@ -590,15 +589,19 @@ count RN r5
sub sp, sp, #16 ; create temp buffer
ldr r6, [src], pstep ; load source data
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
ldr r7, [src], pstep
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
ldr r8, [src], pstep
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r12], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r12] ; thresh
orr r2, r2, r2, lsl #8
ldr lr, [src], pstep
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|Vnext8|
@@ -857,18 +860,26 @@ count RN r5
sub src, src, #4 ; move src pointer down by 4
ldr count, [sp, #40] ; count for 8-in-parallel
ldr r12, [sp, #36] ; load thresh address
pld [src, #23] ; preload for next block
sub sp, sp, #16 ; create temp buffer
ldr r6, [src], pstep ; load source data
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
pld [src, #23]
ldr r7, [src], pstep
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
pld [src, #23]
ldr r8, [src], pstep
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r12], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r12] ; thresh
orr r2, r2, r2, lsl #8
pld [src, #23]
ldr lr, [src], pstep
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|MBVnext8|
; vp8_filter_mask() function
@@ -908,6 +919,7 @@ count RN r5
str lr, [sp, #8]
ldr lr, [src], pstep
TRANSPOSE_MATRIX r6, r7, r8, lr, r9, r10, r11, r12
ldr lr, [sp, #8] ; load back (f)limit accumulator
@@ -956,6 +968,7 @@ count RN r5
beq mbvskip_filter ; skip filtering
;vp8_hevmask() function
;calculate high edge variance
@@ -1123,6 +1136,7 @@ count RN r5
smlabb r8, r6, lr, r7
smlatb r6, r6, lr, r7
smlabb r9, r10, lr, r7
smlatb r10, r10, lr, r7
ssat r8, #8, r8, asr #7
ssat r6, #8, r6, asr #7
@@ -1242,9 +1256,13 @@ count RN r5
sub src, src, #4
subs count, count, #1
pld [src, #23] ; preload for next block
ldrne r6, [src], pstep ; load source data
pld [src, #23]
ldrne r7, [src], pstep
pld [src, #23]
ldrne r8, [src], pstep
pld [src, #23]
ldrne lr, [src], pstep
bne MBVnext8

View File

@@ -45,35 +45,28 @@
MEND
src RN r0
pstep RN r1
;r0 unsigned char *src_ptr,
;r1 int src_pixel_step,
;r2 const char *flimit,
;r3 const char *limit,
;stack const char *thresh,
;stack int count
; All 16 elements in flimit are equal. So, in the code, only one load is needed
; for flimit. Same applies to limit. thresh is not used in simple looopfilter
;r2 const char *blimit
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
|vp8_loop_filter_simple_horizontal_edge_armv6| PROC
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
stmdb sp!, {r4 - r11, lr}
ldr r12, [r3] ; limit
ldrb r12, [r2] ; blimit
ldr r3, [src, -pstep, lsl #1] ; p1
ldr r4, [src, -pstep] ; p0
ldr r5, [src] ; q0
ldr r6, [src, pstep] ; q1
ldr r7, [r2] ; flimit
orr r12, r12, r12, lsl #8 ; blimit
ldr r2, c0x80808080
ldr r9, [sp, #40] ; count for 8-in-parallel
uadd8 r7, r7, r7 ; flimit * 2
mov r9, r9, lsl #1 ; double the count. we're doing 4 at a time
uadd8 r12, r7, r12 ; flimit * 2 + limit
orr r12, r12, r12, lsl #16 ; blimit
mov r9, #4 ; double the count. we're doing 4 at a time
mov lr, #0 ; need 0 in a couple places
|simple_hnext8|
@@ -148,30 +141,32 @@ pstep RN r1
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
stmdb sp!, {r4 - r11, lr}
ldr r12, [r2] ; r12: flimit
ldrb r12, [r2] ; r12: blimit
ldr r2, c0x80808080
ldr r7, [r3] ; limit
orr r12, r12, r12, lsl #8
; load soure data to r7, r8, r9, r10
ldrh r3, [src, #-2]
pld [src, #23] ; preload for next block
ldrh r4, [src], pstep
uadd8 r12, r12, r12 ; flimit * 2
orr r12, r12, r12, lsl #16
ldrh r5, [src, #-2]
pld [src, #23]
ldrh r6, [src], pstep
uadd8 r12, r12, r7 ; flimit * 2 + limit
pkhbt r7, r3, r4, lsl #16
ldrh r3, [src, #-2]
pld [src, #23]
ldrh r4, [src], pstep
ldr r11, [sp, #40] ; count (r11) for 8-in-parallel
pkhbt r8, r5, r6, lsl #16
ldrh r5, [src, #-2]
pld [src, #23]
ldrh r6, [src], pstep
mov r11, r11, lsl #1 ; 4-in-parallel
mov r11, #4 ; double the count. we're doing 4 at a time
|simple_vnext8|
; vp8_simple_filter_mask() function
@@ -259,19 +254,23 @@ pstep RN r1
; load soure data to r7, r8, r9, r10
ldrneh r3, [src, #-2]
pld [src, #23] ; preload for next block
ldrneh r4, [src], pstep
ldrneh r5, [src, #-2]
pld [src, #23]
ldrneh r6, [src], pstep
pkhbt r7, r3, r4, lsl #16
ldrneh r3, [src, #-2]
pld [src, #23]
ldrneh r4, [src], pstep
pkhbt r8, r5, r6, lsl #16
ldrneh r5, [src, #-2]
pld [src, #23]
ldrneh r6, [src], pstep
bne simple_vnext8

View File

@@ -32,9 +32,12 @@
beq skip_firstpass_filter
;first-pass filter
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
sub r0, r0, r1, lsl #1
add r3, r1, #10 ; preload next low
pld [r0, r3]
add r2, r12, r2, lsl #4 ;calculate filter location
add r0, r0, #3 ;adjust src only for loading convinience
@@ -110,6 +113,9 @@
add r0, r0, r1 ; move to next input line
add r11, r1, #18 ; preload next low. adding back block width(=8), which is subtracted earlier
pld [r0, r11]
bne first_pass_hloop_v6
;second pass filter
@@ -121,7 +127,7 @@ secondpass_filter
cmp r3, #0
beq skip_secondpass_filter
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
add lr, r12, r3, lsl #4 ;calculate filter location
mov r2, #0x00080000
@@ -245,8 +251,6 @@ skip_secondpass_hloop
;-----------------
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_
DCD filter8_coeff
filter8_coeff
DCD 0x00000000, 0x00000080, 0x00000000, 0x00000000
DCD 0xfffa0000, 0x000c007b, 0x0000ffff, 0x00000000

View File

@@ -25,6 +25,28 @@ extern void vp8_filter_block2d_first_pass_armv6
const short *vp8_filter
);
// 8x8
extern void vp8_filter_block2d_first_pass_8x8_armv6
(
unsigned char *src_ptr,
short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_width,
unsigned int output_height,
const short *vp8_filter
);
// 16x16
extern void vp8_filter_block2d_first_pass_16x16_armv6
(
unsigned char *src_ptr,
short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_width,
unsigned int output_height,
const short *vp8_filter
);
extern void vp8_filter_block2d_second_pass_armv6
(
short *src_ptr,
@@ -143,12 +165,12 @@ void vp8_sixtap_predict8x8_armv6
{
if (yoffset & 0x1)
{
vp8_filter_block2d_first_pass_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 8, 11, HFilter);
vp8_filter_block2d_first_pass_8x8_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 8, 11, HFilter);
vp8_filter4_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 8, VFilter);
}
else
{
vp8_filter_block2d_first_pass_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 8, 13, HFilter);
vp8_filter_block2d_first_pass_8x8_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 8, 13, HFilter);
vp8_filter_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 8, VFilter);
}
}
@@ -185,12 +207,12 @@ void vp8_sixtap_predict16x16_armv6
{
if (yoffset & 0x1)
{
vp8_filter_block2d_first_pass_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 16, 19, HFilter);
vp8_filter_block2d_first_pass_16x16_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 16, 19, HFilter);
vp8_filter4_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 16, VFilter);
}
else
{
vp8_filter_block2d_first_pass_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 16, 21, HFilter);
vp8_filter_block2d_first_pass_16x16_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 16, 21, HFilter);
vp8_filter_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 16, VFilter);
}
}

View File

@@ -9,135 +9,107 @@
*/
#include "vpx_ports/config.h"
#include <math.h>
#include "vpx_config.h"
#include "vp8/common/loopfilter.h"
#include "vp8/common/onyxc_int.h"
#if HAVE_ARMV6
extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_vertical_edge_armv6);
extern prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_mbloop_filter_vertical_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_simple_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_simple_vertical_edge_armv6);
#endif
extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_y_neon);
extern prototype_loopfilter(vp8_loop_filter_vertical_edge_y_neon);
extern prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_y_neon);
extern prototype_loopfilter(vp8_mbloop_filter_vertical_edge_y_neon);
extern prototype_loopfilter(vp8_loop_filter_simple_horizontal_edge_neon);
extern prototype_loopfilter(vp8_loop_filter_simple_vertical_edge_neon);
#if HAVE_ARMV7
typedef void loopfilter_y_neon(unsigned char *src, int pitch,
unsigned char blimit, unsigned char limit, unsigned char thresh);
typedef void loopfilter_uv_neon(unsigned char *u, int pitch,
unsigned char blimit, unsigned char limit, unsigned char thresh,
unsigned char *v);
extern loop_filter_uvfunction vp8_loop_filter_horizontal_edge_uv_neon;
extern loop_filter_uvfunction vp8_loop_filter_vertical_edge_uv_neon;
extern loop_filter_uvfunction vp8_mbloop_filter_horizontal_edge_uv_neon;
extern loop_filter_uvfunction vp8_mbloop_filter_vertical_edge_uv_neon;
extern loopfilter_y_neon vp8_loop_filter_horizontal_edge_y_neon;
extern loopfilter_y_neon vp8_loop_filter_vertical_edge_y_neon;
extern loopfilter_y_neon vp8_mbloop_filter_horizontal_edge_y_neon;
extern loopfilter_y_neon vp8_mbloop_filter_vertical_edge_y_neon;
extern loopfilter_uv_neon vp8_loop_filter_horizontal_edge_uv_neon;
extern loopfilter_uv_neon vp8_loop_filter_vertical_edge_uv_neon;
extern loopfilter_uv_neon vp8_mbloop_filter_horizontal_edge_uv_neon;
extern loopfilter_uv_neon vp8_mbloop_filter_vertical_edge_uv_neon;
#endif
#if HAVE_ARMV6
/*ARMV6 loopfilter functions*/
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
}
void vp8_loop_filter_mbhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
}
void vp8_loop_filter_mbvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bhs_armv6(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, blimit);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_armv6(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 4, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 8, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 12, y_stride, blimit);
}
#endif
@@ -145,93 +117,60 @@ void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsig
/* NEON loopfilter functions */
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
unsigned char mblim = *lfi->mblim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, mblim, lim, hev_thr);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, v_ptr);
}
void vp8_loop_filter_mbhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, mblim, lim, hev_thr, v_ptr);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
unsigned char mblim = *lfi->mblim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, mblim, lim, hev_thr);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, v_ptr);
}
void vp8_loop_filter_mbvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, mblim, lim, hev_thr, v_ptr);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
unsigned char blim = *lfi->blim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 4 * y_stride, y_stride, blim, lim, hev_thr);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 8 * y_stride, y_stride, blim, lim, hev_thr);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 12 * y_stride, y_stride, blim, lim, hev_thr);
if (u_ptr)
vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, v_ptr + 4 * uv_stride);
}
void vp8_loop_filter_bhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, blim, lim, hev_thr, v_ptr + 4 * uv_stride);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
unsigned char blim = *lfi->blim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 4, y_stride, blim, lim, hev_thr);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 8, y_stride, blim, lim, hev_thr);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 12, y_stride, blim, lim, hev_thr);
if (u_ptr)
vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, v_ptr + 4);
}
void vp8_loop_filter_bvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_neon(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_neon(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_neon(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, blim, lim, hev_thr, v_ptr + 4);
}
#endif

View File

@@ -12,15 +12,17 @@
#ifndef LOOPFILTER_ARM_H
#define LOOPFILTER_ARM_H
#include "vpx_config.h"
#if HAVE_ARMV6
extern prototype_loopfilter_block(vp8_loop_filter_mbv_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bv_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_mbh_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bh_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_mbvs_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bvs_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_mbhs_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bhs_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_bvs_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_bhs_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_armv6);
#if !CONFIG_RUNTIME_CPU_DETECT
#undef vp8_lf_normal_mb_v
@@ -36,28 +38,29 @@ extern prototype_loopfilter_block(vp8_loop_filter_bhs_armv6);
#define vp8_lf_normal_b_h vp8_loop_filter_bh_armv6
#undef vp8_lf_simple_mb_v
#define vp8_lf_simple_mb_v vp8_loop_filter_mbvs_armv6
#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_armv6
#undef vp8_lf_simple_b_v
#define vp8_lf_simple_b_v vp8_loop_filter_bvs_armv6
#undef vp8_lf_simple_mb_h
#define vp8_lf_simple_mb_h vp8_loop_filter_mbhs_armv6
#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_armv6
#undef vp8_lf_simple_b_h
#define vp8_lf_simple_b_h vp8_loop_filter_bhs_armv6
#endif
#endif
#endif /* !CONFIG_RUNTIME_CPU_DETECT */
#endif /* HAVE_ARMV6 */
#if HAVE_ARMV7
extern prototype_loopfilter_block(vp8_loop_filter_mbv_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bv_neon);
extern prototype_loopfilter_block(vp8_loop_filter_mbh_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bh_neon);
extern prototype_loopfilter_block(vp8_loop_filter_mbvs_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bvs_neon);
extern prototype_loopfilter_block(vp8_loop_filter_mbhs_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bhs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_mbvs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_bvs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_mbhs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_bhs_neon);
#if !CONFIG_RUNTIME_CPU_DETECT
#undef vp8_lf_normal_mb_v
@@ -83,7 +86,8 @@ extern prototype_loopfilter_block(vp8_loop_filter_bhs_neon);
#undef vp8_lf_simple_b_h
#define vp8_lf_simple_b_h vp8_loop_filter_bhs_neon
#endif
#endif
#endif /* !CONFIG_RUNTIME_CPU_DETECT */
#endif
#endif /* HAVE_ARMV7 */
#endif /* LOOPFILTER_ARM_H */

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict16x16_neon| PROC
push {r4-r5, lr}
ldr r12, _bifilter16_coeff_
adr r12, bifilter16_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -351,8 +351,6 @@ filt_blk2d_spo16x16_loop_neon
;-----------------
_bifilter16_coeff_
DCD bifilter16_coeff
bifilter16_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict4x4_neon| PROC
push {r4, lr}
ldr r12, _bifilter4_coeff_
adr r12, bifilter4_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -124,8 +124,6 @@ skip_secondpass_filter
;-----------------
_bifilter4_coeff_
DCD bifilter4_coeff
bifilter4_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict8x4_neon| PROC
push {r4, lr}
ldr r12, _bifilter8x4_coeff_
adr r12, bifilter8x4_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -129,8 +129,6 @@ skip_secondpass_filter
;-----------------
_bifilter8x4_coeff_
DCD bifilter8x4_coeff
bifilter8x4_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict8x8_neon| PROC
push {r4, lr}
ldr r12, _bifilter8_coeff_
adr r12, bifilter8_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -177,8 +177,6 @@ skip_secondpass_filter
;-----------------
_bifilter8_coeff_
DCD bifilter8_coeff
bifilter8_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -20,19 +20,16 @@
|vp8_short_inv_walsh4x4_neon| PROC
; read in all four lines of values: d0->d3
vldm.64 r0, {q0, q1}
vld1.i16 {q0-q1}, [r0@128]
; first for loop
vadd.s16 d4, d0, d3 ;a = [0] + [12]
vadd.s16 d5, d1, d2 ;b = [4] + [8]
vsub.s16 d6, d1, d2 ;c = [4] - [8]
vsub.s16 d7, d0, d3 ;d = [0] - [12]
vadd.s16 d6, d1, d2 ;b = [4] + [8]
vsub.s16 d5, d0, d3 ;d = [0] - [12]
vsub.s16 d7, d1, d2 ;c = [4] - [8]
vadd.s16 d0, d4, d5 ;a + b
vadd.s16 d1, d6, d7 ;c + d
vsub.s16 d2, d4, d5 ;a - b
vsub.s16 d3, d7, d6 ;d - c
vadd.s16 q0, q2, q3 ; a+b d+c
vsub.s16 q1, q2, q3 ; a-b d-c
vtrn.32 d0, d2 ;d0: 0 1 8 9
;d2: 2 3 10 11
@@ -47,29 +44,22 @@
; second for loop
vadd.s16 d4, d0, d3 ;a = [0] + [3]
vadd.s16 d5, d1, d2 ;b = [1] + [2]
vsub.s16 d6, d1, d2 ;c = [1] - [2]
vsub.s16 d7, d0, d3 ;d = [0] - [3]
vadd.s16 d6, d1, d2 ;b = [1] + [2]
vsub.s16 d5, d0, d3 ;d = [0] - [3]
vsub.s16 d7, d1, d2 ;c = [1] - [2]
vadd.s16 d0, d4, d5 ;e = a + b
vadd.s16 d1, d6, d7 ;f = c + d
vsub.s16 d2, d4, d5 ;g = a - b
vsub.s16 d3, d7, d6 ;h = d - c
vmov.i16 q8, #3
vmov.i16 q2, #3
vadd.i16 q0, q0, q2 ;e/f += 3
vadd.i16 q1, q1, q2 ;g/h += 3
vadd.s16 q0, q2, q3 ; a+b d+c
vsub.s16 q1, q2, q3 ; a-b d-c
vadd.i16 q0, q0, q8 ;e/f += 3
vadd.i16 q1, q1, q8 ;g/h += 3
vshr.s16 q0, q0, #3 ;e/f >> 3
vshr.s16 q1, q1, #3 ;g/h >> 3
vtrn.32 d0, d2
vtrn.32 d1, d3
vtrn.16 d0, d1
vtrn.16 d2, d3
vstmia.16 r1!, {q0}
vstmia.16 r1!, {q1}
vst4.i16 {d0,d1,d2,d3}, [r1@128]
bx lr
ENDP ; |vp8_short_inv_walsh4x4_neon|
@@ -77,19 +67,13 @@
;short vp8_short_inv_walsh4x4_1_neon(short *input, short *output)
|vp8_short_inv_walsh4x4_1_neon| PROC
; load a full line into a neon register
vld1.16 {q0}, [r0]
; extract first element and replicate
vdup.16 q1, d0[0]
; add 3 to all values
vmov.i16 q2, #3
vadd.i16 q3, q1, q2
; right shift
vshr.s16 q3, q3, #3
; write it back
vstmia.16 r1!, {q3}
vstmia.16 r1!, {q3}
ldrsh r2, [r0] ; load input[0]
add r3, r2, #3 ; add 3
add r2, r1, #16 ; base for last 8 output
asr r0, r3, #3 ; right shift 3
vdup.16 q0, r0 ; load and duplicate
vst1.16 {q0}, [r1@128] ; write back 8
vst1.16 {q0}, [r2@128] ; write back last 8
bx lr
ENDP ; |vp8_short_inv_walsh4x4_1_neon|

View File

@@ -14,109 +14,97 @@
EXPORT |vp8_loop_filter_vertical_edge_y_neon|
EXPORT |vp8_loop_filter_vertical_edge_uv_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; flimit, limit, and thresh should be positive numbers.
; All 16 elements in these variables are equal.
; void vp8_loop_filter_horizontal_edge_y_neon(unsigned char *src, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; int count)
; r0 unsigned char *src
; r1 int pitch
; r2 const signed char *flimit
; r3 const signed char *limit
; sp const signed char *thresh,
; sp+4 int count (unused)
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_loop_filter_horizontal_edge_y_neon| PROC
stmdb sp!, {lr}
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
sub r2, r0, r1, lsl #2 ; move src pointer down by 4 lines
ldr r12, [sp, #4] ; load thresh pointer
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
vdup.u8 q1, r3 ; duplicate limit
sub r2, r0, r1, lsl #2 ; move src pointer down by 4 lines
ldr r3, [sp, #4] ; load thresh
add r12, r2, r1
add r1, r1, r1
vld1.u8 {q3}, [r2], r1 ; p3
vld1.u8 {q4}, [r2], r1 ; p2
vld1.u8 {q5}, [r2], r1 ; p1
vld1.u8 {q6}, [r2], r1 ; p0
vld1.u8 {q7}, [r2], r1 ; q0
vld1.u8 {q8}, [r2], r1 ; q1
vld1.u8 {q9}, [r2], r1 ; q2
vld1.u8 {q10}, [r2] ; q3
vld1.s8 {d4[], d5[]}, [r12] ; thresh
sub r0, r0, r1, lsl #1
vdup.u8 q2, r3 ; duplicate thresh
vld1.u8 {q3}, [r2@128], r1 ; p3
vld1.u8 {q4}, [r12@128], r1 ; p2
vld1.u8 {q5}, [r2@128], r1 ; p1
vld1.u8 {q6}, [r12@128], r1 ; p0
vld1.u8 {q7}, [r2@128], r1 ; q0
vld1.u8 {q8}, [r12@128], r1 ; q1
vld1.u8 {q9}, [r2@128] ; q2
vld1.u8 {q10}, [r12@128] ; q3
sub r2, r2, r1, lsl #1
sub r12, r12, r1, lsl #1
bl vp8_loop_filter_neon
vst1.u8 {q5}, [r0], r1 ; store op1
vst1.u8 {q6}, [r0], r1 ; store op0
vst1.u8 {q7}, [r0], r1 ; store oq0
vst1.u8 {q8}, [r0], r1 ; store oq1
vst1.u8 {q5}, [r2@128], r1 ; store op1
vst1.u8 {q6}, [r12@128], r1 ; store op0
vst1.u8 {q7}, [r2@128], r1 ; store oq0
vst1.u8 {q8}, [r12@128], r1 ; store oq1
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_loop_filter_horizontal_edge_y_neon|
; void vp8_loop_filter_horizontal_edge_uv_neon(unsigned char *u, int pitch
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
; sp+4 unsigned char *v
|vp8_loop_filter_horizontal_edge_uv_neon| PROC
stmdb sp!, {lr}
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
vdup.u8 q1, r3 ; duplicate limit
ldr r12, [sp, #4] ; load thresh
ldr r2, [sp, #8] ; load v ptr
vdup.u8 q2, r12 ; duplicate thresh
sub r3, r0, r1, lsl #2 ; move u pointer down by 4 lines
vld1.u8 {d6}, [r3], r1 ; p3
vld1.u8 {d8}, [r3], r1 ; p2
vld1.u8 {d10}, [r3], r1 ; p1
vld1.u8 {d12}, [r3], r1 ; p0
vld1.u8 {d14}, [r3], r1 ; q0
vld1.u8 {d16}, [r3], r1 ; q1
vld1.u8 {d18}, [r3], r1 ; q2
vld1.u8 {d20}, [r3] ; q3
ldr r3, [sp, #4] ; load thresh pointer
sub r12, r2, r1, lsl #2 ; move v pointer down by 4 lines
vld1.u8 {d7}, [r12], r1 ; p3
vld1.u8 {d9}, [r12], r1 ; p2
vld1.u8 {d11}, [r12], r1 ; p1
vld1.u8 {d13}, [r12], r1 ; p0
vld1.u8 {d15}, [r12], r1 ; q0
vld1.u8 {d17}, [r12], r1 ; q1
vld1.u8 {d19}, [r12], r1 ; q2
vld1.u8 {d21}, [r12] ; q3
vld1.s8 {d4[], d5[]}, [r3] ; thresh
vld1.u8 {d6}, [r3@64], r1 ; p3
vld1.u8 {d7}, [r12@64], r1 ; p3
vld1.u8 {d8}, [r3@64], r1 ; p2
vld1.u8 {d9}, [r12@64], r1 ; p2
vld1.u8 {d10}, [r3@64], r1 ; p1
vld1.u8 {d11}, [r12@64], r1 ; p1
vld1.u8 {d12}, [r3@64], r1 ; p0
vld1.u8 {d13}, [r12@64], r1 ; p0
vld1.u8 {d14}, [r3@64], r1 ; q0
vld1.u8 {d15}, [r12@64], r1 ; q0
vld1.u8 {d16}, [r3@64], r1 ; q1
vld1.u8 {d17}, [r12@64], r1 ; q1
vld1.u8 {d18}, [r3@64], r1 ; q2
vld1.u8 {d19}, [r12@64], r1 ; q2
vld1.u8 {d20}, [r3@64] ; q3
vld1.u8 {d21}, [r12@64] ; q3
bl vp8_loop_filter_neon
sub r0, r0, r1, lsl #1
sub r2, r2, r1, lsl #1
vst1.u8 {d10}, [r0], r1 ; store u op1
vst1.u8 {d11}, [r2], r1 ; store v op1
vst1.u8 {d12}, [r0], r1 ; store u op0
vst1.u8 {d13}, [r2], r1 ; store v op0
vst1.u8 {d14}, [r0], r1 ; store u oq0
vst1.u8 {d15}, [r2], r1 ; store v oq0
vst1.u8 {d16}, [r0] ; store u oq1
vst1.u8 {d17}, [r2] ; store v oq1
vst1.u8 {d10}, [r0@64], r1 ; store u op1
vst1.u8 {d11}, [r2@64], r1 ; store v op1
vst1.u8 {d12}, [r0@64], r1 ; store u op0
vst1.u8 {d13}, [r2@64], r1 ; store v op0
vst1.u8 {d14}, [r0@64], r1 ; store u oq0
vst1.u8 {d15}, [r2@64], r1 ; store v oq0
vst1.u8 {d16}, [r0@64] ; store u oq1
vst1.u8 {d17}, [r2@64] ; store v oq1
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_loop_filter_horizontal_edge_uv_neon|
; void vp8_loop_filter_vertical_edge_y_neon(unsigned char *src, int pitch,
@@ -124,39 +112,38 @@
; const signed char *limit,
; const signed char *thresh,
; int count)
; r0 unsigned char *src,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 int count (unused)
; r0 unsigned char *src
; r1 int pitch
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_loop_filter_vertical_edge_y_neon| PROC
stmdb sp!, {lr}
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
sub r2, r0, #4 ; src ptr down by 4 columns
sub r0, r0, #2 ; dst ptr
ldr r12, [sp, #4] ; load thresh pointer
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
vdup.u8 q1, r3 ; duplicate limit
sub r2, r0, #4 ; src ptr down by 4 columns
add r1, r1, r1
ldr r3, [sp, #4] ; load thresh
add r12, r2, r1, asr #1
vld1.u8 {d6}, [r2], r1 ; load first 8-line src data
vld1.u8 {d8}, [r2], r1
vld1.u8 {d6}, [r2], r1
vld1.u8 {d8}, [r12], r1
vld1.u8 {d10}, [r2], r1
vld1.u8 {d12}, [r2], r1
vld1.u8 {d12}, [r12], r1
vld1.u8 {d14}, [r2], r1
vld1.u8 {d16}, [r2], r1
vld1.u8 {d16}, [r12], r1
vld1.u8 {d18}, [r2], r1
vld1.u8 {d20}, [r2], r1
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.u8 {d20}, [r12], r1
vld1.u8 {d7}, [r2], r1 ; load second 8-line src data
vld1.u8 {d9}, [r2], r1
vld1.u8 {d9}, [r12], r1
vld1.u8 {d11}, [r2], r1
vld1.u8 {d13}, [r2], r1
vld1.u8 {d13}, [r12], r1
vld1.u8 {d15}, [r2], r1
vld1.u8 {d17}, [r2], r1
vld1.u8 {d19}, [r2], r1
vld1.u8 {d21}, [r2]
vld1.u8 {d17}, [r12], r1
vld1.u8 {d19}, [r2]
vld1.u8 {d21}, [r12]
;transpose to 8x16 matrix
vtrn.32 q3, q7
@@ -164,6 +151,8 @@
vtrn.32 q5, q9
vtrn.32 q6, q10
vdup.u8 q2, r3 ; duplicate thresh
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
@@ -178,28 +167,34 @@
vswp d12, d11
vswp d16, d13
sub r0, r0, #2 ; dst ptr
vswp d14, d12
vswp d16, d15
add r12, r0, r1, asr #1
;store op1, op0, oq0, oq1
vst4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r1
vst4.8 {d10[1], d11[1], d12[1], d13[1]}, [r0], r1
vst4.8 {d10[1], d11[1], d12[1], d13[1]}, [r12], r1
vst4.8 {d10[2], d11[2], d12[2], d13[2]}, [r0], r1
vst4.8 {d10[3], d11[3], d12[3], d13[3]}, [r0], r1
vst4.8 {d10[3], d11[3], d12[3], d13[3]}, [r12], r1
vst4.8 {d10[4], d11[4], d12[4], d13[4]}, [r0], r1
vst4.8 {d10[5], d11[5], d12[5], d13[5]}, [r0], r1
vst4.8 {d10[5], d11[5], d12[5], d13[5]}, [r12], r1
vst4.8 {d10[6], d11[6], d12[6], d13[6]}, [r0], r1
vst4.8 {d10[7], d11[7], d12[7], d13[7]}, [r0], r1
vst4.8 {d14[0], d15[0], d16[0], d17[0]}, [r0], r1
vst4.8 {d14[1], d15[1], d16[1], d17[1]}, [r0], r1
vst4.8 {d14[2], d15[2], d16[2], d17[2]}, [r0], r1
vst4.8 {d14[3], d15[3], d16[3], d17[3]}, [r0], r1
vst4.8 {d14[4], d15[4], d16[4], d17[4]}, [r0], r1
vst4.8 {d14[5], d15[5], d16[5], d17[5]}, [r0], r1
vst4.8 {d14[6], d15[6], d16[6], d17[6]}, [r0], r1
vst4.8 {d14[7], d15[7], d16[7], d17[7]}, [r0]
vst4.8 {d10[7], d11[7], d12[7], d13[7]}, [r12], r1
ldmia sp!, {pc}
vst4.8 {d14[0], d15[0], d16[0], d17[0]}, [r0], r1
vst4.8 {d14[1], d15[1], d16[1], d17[1]}, [r12], r1
vst4.8 {d14[2], d15[2], d16[2], d17[2]}, [r0], r1
vst4.8 {d14[3], d15[3], d16[3], d17[3]}, [r12], r1
vst4.8 {d14[4], d15[4], d16[4], d17[4]}, [r0], r1
vst4.8 {d14[5], d15[5], d16[5], d17[5]}, [r12], r1
vst4.8 {d14[6], d15[6], d16[6], d17[6]}, [r0]
vst4.8 {d14[7], d15[7], d16[7], d17[7]}, [r12]
pop {pc}
ENDP ; |vp8_loop_filter_vertical_edge_y_neon|
; void vp8_loop_filter_vertical_edge_uv_neon(unsigned char *u, int pitch
@@ -209,38 +204,36 @@
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
; sp+4 unsigned char *v
|vp8_loop_filter_vertical_edge_uv_neon| PROC
stmdb sp!, {lr}
sub r12, r0, #4 ; move u pointer down by 4 columns
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
sub r12, r0, #4 ; move u pointer down by 4 columns
ldr r2, [sp, #8] ; load v ptr
vld1.u8 {d6}, [r12], r1 ;load u data
vld1.u8 {d8}, [r12], r1
vld1.u8 {d10}, [r12], r1
vld1.u8 {d12}, [r12], r1
vld1.u8 {d14}, [r12], r1
vld1.u8 {d16}, [r12], r1
vld1.u8 {d18}, [r12], r1
vld1.u8 {d20}, [r12]
vdup.u8 q1, r3 ; duplicate limit
sub r3, r2, #4 ; move v pointer down by 4 columns
vld1.u8 {d6}, [r12], r1 ;load u data
vld1.u8 {d7}, [r3], r1 ;load v data
vld1.u8 {d8}, [r12], r1
vld1.u8 {d9}, [r3], r1
vld1.u8 {d10}, [r12], r1
vld1.u8 {d11}, [r3], r1
vld1.u8 {d12}, [r12], r1
vld1.u8 {d13}, [r3], r1
vld1.u8 {d14}, [r12], r1
vld1.u8 {d15}, [r3], r1
vld1.u8 {d16}, [r12], r1
vld1.u8 {d17}, [r3], r1
vld1.u8 {d18}, [r12], r1
vld1.u8 {d19}, [r3], r1
vld1.u8 {d20}, [r12]
vld1.u8 {d21}, [r3]
ldr r12, [sp, #4] ; load thresh pointer
ldr r12, [sp, #4] ; load thresh
;transpose to 8x16 matrix
vtrn.32 q3, q7
@@ -248,6 +241,8 @@
vtrn.32 q5, q9
vtrn.32 q6, q10
vdup.u8 q2, r12 ; duplicate thresh
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
@@ -258,18 +253,16 @@
vtrn.8 q7, q8
vtrn.8 q9, q10
vld1.s8 {d4[], d5[]}, [r12] ; thresh
bl vp8_loop_filter_neon
sub r0, r0, #2
sub r2, r2, #2
vswp d12, d11
vswp d16, d13
vswp d14, d12
vswp d16, d15
sub r0, r0, #2
sub r2, r2, #2
;store op1, op0, oq0, oq1
vst4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r1
vst4.8 {d14[0], d15[0], d16[0], d17[0]}, [r2], r1
@@ -288,7 +281,7 @@
vst4.8 {d10[7], d11[7], d12[7], d13[7]}, [r0]
vst4.8 {d14[7], d15[7], d16[7], d17[7]}, [r2]
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_loop_filter_vertical_edge_uv_neon|
; void vp8_loop_filter_neon();
@@ -308,7 +301,6 @@
; q9 q2
; q10 q3
|vp8_loop_filter_neon| PROC
ldr r12, _lf_coeff_
; vp8_filter_mask
vabd.u8 q11, q3, q4 ; abs(p3 - p2)
@@ -317,42 +309,44 @@
vabd.u8 q14, q8, q7 ; abs(q1 - q0)
vabd.u8 q3, q9, q8 ; abs(q2 - q1)
vabd.u8 q4, q10, q9 ; abs(q3 - q2)
vabd.u8 q9, q6, q7 ; abs(p0 - q0)
vmax.u8 q11, q11, q12
vmax.u8 q12, q13, q14
vmax.u8 q3, q3, q4
vmax.u8 q15, q11, q12
vabd.u8 q9, q6, q7 ; abs(p0 - q0)
; vp8_hevmask
vcgt.u8 q13, q13, q2 ; (abs(p1 - p0) > thresh)*-1
vcgt.u8 q14, q14, q2 ; (abs(q1 - q0) > thresh)*-1
vmax.u8 q15, q15, q3
vadd.u8 q0, q0, q0 ; flimit * 2
vadd.u8 q0, q0, q1 ; flimit * 2 + limit
vcge.u8 q15, q1, q15
vmov.u8 q10, #0x80 ; 0x80
vabd.u8 q2, q5, q8 ; a = abs(p1 - q1)
vqadd.u8 q9, q9, q9 ; b = abs(p0 - q0) * 2
vshr.u8 q2, q2, #1 ; a = a / 2
vqadd.u8 q9, q9, q2 ; a = b + a
vcge.u8 q9, q0, q9 ; (a > flimit * 2 + limit) * -1
vld1.u8 {q0}, [r12]!
vcge.u8 q15, q1, q15
; vp8_filter() function
; convert to signed
veor q7, q7, q0 ; qs0
veor q6, q6, q0 ; ps0
veor q5, q5, q0 ; ps1
veor q8, q8, q0 ; qs1
veor q7, q7, q10 ; qs0
vshr.u8 q2, q2, #1 ; a = a / 2
veor q6, q6, q10 ; ps0
vld1.u8 {q10}, [r12]!
veor q5, q5, q10 ; ps1
vqadd.u8 q9, q9, q2 ; a = b + a
veor q8, q8, q10 ; qs1
vmov.u8 q10, #3 ; #3
vsubl.s8 q2, d14, d12 ; ( qs0 - ps0)
vsubl.s8 q11, d15, d13
vcge.u8 q9, q0, q9 ; (a > flimit * 2 + limit) * -1
vmovl.u8 q4, d20
vqsub.s8 q1, q5, q8 ; vp8_filter = clamp(ps1-qs1)
@@ -367,7 +361,7 @@
vaddw.s8 q2, q2, d2
vaddw.s8 q11, q11, d3
vld1.u8 {q9}, [r12]!
vmov.u8 q9, #4 ; #4
; vp8_filter = clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d2, q2
@@ -379,19 +373,20 @@
vshr.s8 q2, q2, #3 ; Filter2 >>= 3
vshr.s8 q1, q1, #3 ; Filter1 >>= 3
vqadd.s8 q11, q6, q2 ; u = clamp(ps0 + Filter2)
vqsub.s8 q10, q7, q1 ; u = clamp(qs0 - Filter1)
; outer tap adjustments: ++vp8_filter >> 1
vrshr.s8 q1, q1, #1
vbic q1, q1, q14 ; vp8_filter &= ~hev
vmov.u8 q0, #0x80 ; 0x80
vqadd.s8 q13, q5, q1 ; u = clamp(ps1 + vp8_filter)
vqsub.s8 q12, q8, q1 ; u = clamp(qs1 - vp8_filter)
veor q5, q13, q0 ; *op1 = u^0x80
veor q6, q11, q0 ; *op0 = u^0x80
veor q7, q10, q0 ; *oq0 = u^0x80
veor q5, q13, q0 ; *op1 = u^0x80
veor q8, q12, q0 ; *oq1 = u^0x80
bx lr
@@ -399,12 +394,4 @@
;-----------------
_lf_coeff_
DCD lf_coeff
lf_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
DCD 0x01010101, 0x01010101, 0x01010101, 0x01010101
END

View File

@@ -9,107 +9,109 @@
;
EXPORT |vp8_loop_filter_simple_horizontal_edge_neon|
;EXPORT |vp8_loop_filter_simple_horizontal_edge_neon|
EXPORT |vp8_loop_filter_bhs_neon|
EXPORT |vp8_loop_filter_mbhs_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;Note: flimit, limit, and thresh shpuld be positive numbers. All 16 elements in flimit
;are equal. So, in the code, only one load is needed
;for flimit. Same way applies to limit and thresh.
; r0 unsigned char *s,
; r1 int p, //pitch
; r2 const signed char *flimit,
; r3 const signed char *limit,
; stack(r4) const signed char *thresh,
; //stack(r5) int count --unused
; r0 unsigned char *s, PRESERVE
; r1 int p, PRESERVE
; q1 limit, PRESERVE
|vp8_loop_filter_simple_horizontal_edge_neon| PROC
sub r0, r0, r1, lsl #1 ; move src pointer down by 2 lines
ldr r12, _lfhy_coeff_
vld1.u8 {q5}, [r0], r1 ; p1
vld1.s8 {d2[], d3[]}, [r2] ; flimit
vld1.s8 {d26[], d27[]}, [r3] ; limit -> q13
vld1.u8 {q6}, [r0], r1 ; p0
vld1.u8 {q0}, [r12]! ; 0x80
vld1.u8 {q7}, [r0], r1 ; q0
vld1.u8 {q10}, [r12]! ; 0x03
vld1.u8 {q8}, [r0] ; q1
sub r3, r0, r1, lsl #1 ; move src pointer down by 2 lines
vld1.u8 {q7}, [r0@128], r1 ; q0
vld1.u8 {q5}, [r3@128], r1 ; p0
vld1.u8 {q8}, [r0@128] ; q1
vld1.u8 {q6}, [r3@128] ; p1
;vp8_filter_mask() function
vabd.u8 q15, q6, q7 ; abs(p0 - q0)
vabd.u8 q14, q5, q8 ; abs(p1 - q1)
vqadd.u8 q15, q15, q15 ; abs(p0 - q0) * 2
vshr.u8 q14, q14, #1 ; abs(p1 - q1) / 2
vmov.u8 q0, #0x80 ; 0x80
vmov.s16 q13, #3
vqadd.u8 q15, q15, q14 ; abs(p0 - q0) * 2 + abs(p1 - q1) / 2
;vp8_filter() function
veor q7, q7, q0 ; qs0: q0 offset to convert to a signed value
veor q6, q6, q0 ; ps0: p0 offset to convert to a signed value
veor q5, q5, q0 ; ps1: p1 offset to convert to a signed value
veor q8, q8, q0 ; qs1: q1 offset to convert to a signed value
vadd.u8 q1, q1, q1 ; flimit * 2
vadd.u8 q1, q1, q13 ; flimit * 2 + limit
vcge.u8 q15, q1, q15 ; (abs(p0 - q0)*2 + abs(p1-q1)/2 > flimit*2 + limit)*-1
vcge.u8 q15, q1, q15 ; (abs(p0 - q0)*2 + abs(p1-q1)/2 > limit)*-1
;;;;;;;;;;
;vqsub.s8 q2, q7, q6 ; ( qs0 - ps0)
vsubl.s8 q2, d14, d12 ; ( qs0 - ps0)
vsubl.s8 q3, d15, d13
vqsub.s8 q4, q5, q8 ; q4: vp8_filter = vp8_signed_char_clamp(ps1-qs1)
;vmul.i8 q2, q2, q10 ; 3 * ( qs0 - ps0)
vadd.s16 q11, q2, q2 ; 3 * ( qs0 - ps0)
vadd.s16 q12, q3, q3
vmul.s16 q2, q2, q13 ; 3 * ( qs0 - ps0)
vmul.s16 q3, q3, q13
vld1.u8 {q9}, [r12]! ; 0x04
vadd.s16 q2, q2, q11
vadd.s16 q3, q3, q12
vmov.u8 q10, #0x03 ; 0x03
vmov.u8 q9, #0x04 ; 0x04
vaddw.s8 q2, q2, d8 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q3, q3, d9
;vqadd.s8 q4, q4, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d8, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d9, q3
;;;;;;;;;;;;;
vand q4, q4, q15 ; vp8_filter &= mask
vand q14, q4, q15 ; vp8_filter &= mask
vqadd.s8 q2, q4, q10 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q4, q4, q9 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vqadd.s8 q2, q14, q10 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q3, q14, q9 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vshr.s8 q2, q2, #3 ; Filter2 >>= 3
vshr.s8 q4, q4, #3 ; Filter1 >>= 3
vshr.s8 q4, q3, #3 ; Filter1 >>= 3
sub r0, r0, r1, lsl #1
sub r0, r0, r1
;calculate output
vqadd.s8 q11, q6, q2 ; u = vp8_signed_char_clamp(ps0 + Filter2)
vqsub.s8 q10, q7, q4 ; u = vp8_signed_char_clamp(qs0 - Filter1)
add r3, r0, r1
veor q6, q11, q0 ; *op0 = u^0x80
veor q7, q10, q0 ; *oq0 = u^0x80
vst1.u8 {q6}, [r0] ; store op0
vst1.u8 {q7}, [r3] ; store oq0
vst1.u8 {q6}, [r3@128] ; store op0
vst1.u8 {q7}, [r0@128] ; store oq0
bx lr
ENDP ; |vp8_loop_filter_simple_horizontal_edge_neon|
;-----------------
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
_lfhy_coeff_
DCD lfhy_coeff
lfhy_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
|vp8_loop_filter_bhs_neon| PROC
push {r4, lr}
ldrb r3, [r2] ; load blim from mem
vdup.s8 q1, r3 ; duplicate blim
add r0, r0, r1, lsl #2 ; src = y_ptr + 4 * y_stride
bl vp8_loop_filter_simple_horizontal_edge_neon
; vp8_loop_filter_simple_horizontal_edge_neon preserves r0, r1 and q1
add r0, r0, r1, lsl #2 ; src = y_ptr + 8* y_stride
bl vp8_loop_filter_simple_horizontal_edge_neon
add r0, r0, r1, lsl #2 ; src = y_ptr + 12 * y_stride
pop {r4, lr}
b vp8_loop_filter_simple_horizontal_edge_neon
ENDP ;|vp8_loop_filter_bhs_neon|
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
|vp8_loop_filter_mbhs_neon| PROC
ldrb r3, [r2] ; load blim from mem
vdup.s8 q1, r3 ; duplicate mblim
b vp8_loop_filter_simple_horizontal_edge_neon
ENDP ;|vp8_loop_filter_bhs_neon|
END

View File

@@ -9,60 +9,54 @@
;
EXPORT |vp8_loop_filter_simple_vertical_edge_neon|
;EXPORT |vp8_loop_filter_simple_vertical_edge_neon|
EXPORT |vp8_loop_filter_bvs_neon|
EXPORT |vp8_loop_filter_mbvs_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;Note: flimit, limit, and thresh should be positive numbers. All 16 elements in flimit
;are equal. So, in the code, only one load is needed
;for flimit. Same way applies to limit and thresh.
; r0 unsigned char *s,
; r1 int p, //pitch
; r2 const signed char *flimit,
; r3 const signed char *limit,
; stack(r4) const signed char *thresh,
; //stack(r5) int count --unused
; r0 unsigned char *s, PRESERVE
; r1 int p, PRESERVE
; q1 limit, PRESERVE
|vp8_loop_filter_simple_vertical_edge_neon| PROC
sub r0, r0, #2 ; move src pointer down by 2 columns
add r12, r1, r1
add r3, r0, r1
vld4.8 {d6[0], d7[0], d8[0], d9[0]}, [r0], r1
vld1.s8 {d2[], d3[]}, [r2] ; flimit
vld1.s8 {d26[], d27[]}, [r3] ; limit -> q13
vld4.8 {d6[1], d7[1], d8[1], d9[1]}, [r0], r1
ldr r12, _vlfy_coeff_
vld4.8 {d6[2], d7[2], d8[2], d9[2]}, [r0], r1
vld4.8 {d6[3], d7[3], d8[3], d9[3]}, [r0], r1
vld4.8 {d6[4], d7[4], d8[4], d9[4]}, [r0], r1
vld4.8 {d6[5], d7[5], d8[5], d9[5]}, [r0], r1
vld4.8 {d6[6], d7[6], d8[6], d9[6]}, [r0], r1
vld4.8 {d6[7], d7[7], d8[7], d9[7]}, [r0], r1
vld4.8 {d6[0], d7[0], d8[0], d9[0]}, [r0], r12
vld4.8 {d6[1], d7[1], d8[1], d9[1]}, [r3], r12
vld4.8 {d6[2], d7[2], d8[2], d9[2]}, [r0], r12
vld4.8 {d6[3], d7[3], d8[3], d9[3]}, [r3], r12
vld4.8 {d6[4], d7[4], d8[4], d9[4]}, [r0], r12
vld4.8 {d6[5], d7[5], d8[5], d9[5]}, [r3], r12
vld4.8 {d6[6], d7[6], d8[6], d9[6]}, [r0], r12
vld4.8 {d6[7], d7[7], d8[7], d9[7]}, [r3], r12
vld4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r1
vld1.u8 {q0}, [r12]! ; 0x80
vld4.8 {d10[1], d11[1], d12[1], d13[1]}, [r0], r1
vld1.u8 {q11}, [r12]! ; 0x03
vld4.8 {d10[2], d11[2], d12[2], d13[2]}, [r0], r1
vld1.u8 {q12}, [r12]! ; 0x04
vld4.8 {d10[3], d11[3], d12[3], d13[3]}, [r0], r1
vld4.8 {d10[4], d11[4], d12[4], d13[4]}, [r0], r1
vld4.8 {d10[5], d11[5], d12[5], d13[5]}, [r0], r1
vld4.8 {d10[6], d11[6], d12[6], d13[6]}, [r0], r1
vld4.8 {d10[7], d11[7], d12[7], d13[7]}, [r0], r1
vld4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r12
vld4.8 {d10[1], d11[1], d12[1], d13[1]}, [r3], r12
vld4.8 {d10[2], d11[2], d12[2], d13[2]}, [r0], r12
vld4.8 {d10[3], d11[3], d12[3], d13[3]}, [r3], r12
vld4.8 {d10[4], d11[4], d12[4], d13[4]}, [r0], r12
vld4.8 {d10[5], d11[5], d12[5], d13[5]}, [r3], r12
vld4.8 {d10[6], d11[6], d12[6], d13[6]}, [r0], r12
vld4.8 {d10[7], d11[7], d12[7], d13[7]}, [r3]
vswp d7, d10
vswp d12, d9
;vswp q4, q5 ; p1:q3, p0:q5, q0:q4, q1:q6
;vp8_filter_mask() function
;vp8_hevmask() function
sub r0, r0, r1, lsl #4
vabd.u8 q15, q5, q4 ; abs(p0 - q0)
vabd.u8 q14, q3, q6 ; abs(p1 - q1)
vqadd.u8 q15, q15, q15 ; abs(p0 - q0) * 2
vshr.u8 q14, q14, #1 ; abs(p1 - q1) / 2
vmov.u8 q0, #0x80 ; 0x80
vmov.s16 q11, #3
vqadd.u8 q15, q15, q14 ; abs(p0 - q0) * 2 + abs(p1 - q1) / 2
veor q4, q4, q0 ; qs0: q0 offset to convert to a signed value
@@ -70,87 +64,91 @@
veor q3, q3, q0 ; ps1: p1 offset to convert to a signed value
veor q6, q6, q0 ; qs1: q1 offset to convert to a signed value
vadd.u8 q1, q1, q1 ; flimit * 2
vadd.u8 q1, q1, q13 ; flimit * 2 + limit
vcge.u8 q15, q1, q15 ; abs(p0 - q0)*2 + abs(p1-q1)/2 > flimit*2 + limit)*-1
;vp8_filter() function
;;;;;;;;;;
;vqsub.s8 q2, q5, q4 ; ( qs0 - ps0)
vsubl.s8 q2, d8, d10 ; ( qs0 - ps0)
vsubl.s8 q13, d9, d11
vqsub.s8 q1, q3, q6 ; vp8_filter = vp8_signed_char_clamp(ps1-qs1)
vqsub.s8 q14, q3, q6 ; vp8_filter = vp8_signed_char_clamp(ps1-qs1)
;vmul.i8 q2, q2, q11 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vadd.s16 q10, q2, q2 ; 3 * ( qs0 - ps0)
vadd.s16 q14, q13, q13
vadd.s16 q2, q2, q10
vadd.s16 q13, q13, q14
vmul.s16 q2, q2, q11 ; 3 * ( qs0 - ps0)
vmul.s16 q13, q13, q11
;vqadd.s8 q1, q1, q2
vaddw.s8 q2, q2, d2 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q13, q13, d3
vmov.u8 q11, #0x03 ; 0x03
vmov.u8 q12, #0x04 ; 0x04
vqmovn.s16 d2, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d3, q13
vaddw.s8 q2, q2, d28 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q13, q13, d29
vqmovn.s16 d28, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d29, q13
add r0, r0, #1
add r2, r0, r1
;;;;;;;;;;;
add r3, r0, r1
vand q1, q1, q15 ; vp8_filter &= mask
vand q14, q14, q15 ; vp8_filter &= mask
vqadd.s8 q2, q1, q11 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q1, q1, q12 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vqadd.s8 q2, q14, q11 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q3, q14, q12 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vshr.s8 q2, q2, #3 ; Filter2 >>= 3
vshr.s8 q1, q1, #3 ; Filter1 >>= 3
vshr.s8 q14, q3, #3 ; Filter1 >>= 3
;calculate output
vqsub.s8 q10, q4, q1 ; u = vp8_signed_char_clamp(qs0 - Filter1)
vqadd.s8 q11, q5, q2 ; u = vp8_signed_char_clamp(ps0 + Filter2)
vqsub.s8 q10, q4, q14 ; u = vp8_signed_char_clamp(qs0 - Filter1)
veor q7, q10, q0 ; *oq0 = u^0x80
veor q6, q11, q0 ; *op0 = u^0x80
add r3, r2, r1
veor q7, q10, q0 ; *oq0 = u^0x80
add r12, r1, r1
vswp d13, d14
add r12, r3, r1
;store op1, op0, oq0, oq1
vst2.8 {d12[0], d13[0]}, [r0]
vst2.8 {d12[1], d13[1]}, [r2]
vst2.8 {d12[2], d13[2]}, [r3]
vst2.8 {d12[3], d13[3]}, [r12], r1
add r0, r12, r1
vst2.8 {d12[4], d13[4]}, [r12]
vst2.8 {d12[5], d13[5]}, [r0], r1
add r2, r0, r1
vst2.8 {d12[6], d13[6]}, [r0]
vst2.8 {d12[7], d13[7]}, [r2], r1
add r3, r2, r1
vst2.8 {d14[0], d15[0]}, [r2]
vst2.8 {d14[1], d15[1]}, [r3], r1
add r12, r3, r1
vst2.8 {d14[2], d15[2]}, [r3]
vst2.8 {d14[3], d15[3]}, [r12], r1
add r0, r12, r1
vst2.8 {d14[4], d15[4]}, [r12]
vst2.8 {d14[5], d15[5]}, [r0], r1
add r2, r0, r1
vst2.8 {d14[6], d15[6]}, [r0]
vst2.8 {d14[7], d15[7]}, [r2]
vst2.8 {d12[0], d13[0]}, [r0], r12
vst2.8 {d12[1], d13[1]}, [r3], r12
vst2.8 {d12[2], d13[2]}, [r0], r12
vst2.8 {d12[3], d13[3]}, [r3], r12
vst2.8 {d12[4], d13[4]}, [r0], r12
vst2.8 {d12[5], d13[5]}, [r3], r12
vst2.8 {d12[6], d13[6]}, [r0], r12
vst2.8 {d12[7], d13[7]}, [r3], r12
vst2.8 {d14[0], d15[0]}, [r0], r12
vst2.8 {d14[1], d15[1]}, [r3], r12
vst2.8 {d14[2], d15[2]}, [r0], r12
vst2.8 {d14[3], d15[3]}, [r3], r12
vst2.8 {d14[4], d15[4]}, [r0], r12
vst2.8 {d14[5], d15[5]}, [r3], r12
vst2.8 {d14[6], d15[6]}, [r0], r12
vst2.8 {d14[7], d15[7]}, [r3]
bx lr
ENDP ; |vp8_loop_filter_simple_vertical_edge_neon|
;-----------------
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
_vlfy_coeff_
DCD vlfy_coeff
vlfy_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
|vp8_loop_filter_bvs_neon| PROC
push {r4, lr}
ldrb r3, [r2] ; load blim from mem
mov r4, r0
add r0, r0, #4
vdup.s8 q1, r3 ; duplicate blim
bl vp8_loop_filter_simple_vertical_edge_neon
; vp8_loop_filter_simple_vertical_edge_neon preserves r1 and q1
add r0, r4, #8
bl vp8_loop_filter_simple_vertical_edge_neon
add r0, r4, #12
pop {r4, lr}
b vp8_loop_filter_simple_vertical_edge_neon
ENDP ;|vp8_loop_filter_bvs_neon|
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
|vp8_loop_filter_mbvs_neon| PROC
ldrb r3, [r2] ; load mblim from mem
vdup.s8 q1, r3 ; duplicate mblim
b vp8_loop_filter_simple_vertical_edge_neon
ENDP ;|vp8_loop_filter_bvs_neon|
END

View File

@@ -14,155 +14,143 @@
EXPORT |vp8_mbloop_filter_vertical_edge_y_neon|
EXPORT |vp8_mbloop_filter_vertical_edge_uv_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; flimit, limit, and thresh should be positive numbers.
; All 16 elements in these variables are equal.
; void vp8_mbloop_filter_horizontal_edge_y_neon(unsigned char *src, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; int count)
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh)
; r0 unsigned char *src,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 int count (unused)
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_mbloop_filter_horizontal_edge_y_neon| PROC
stmdb sp!, {lr}
sub r0, r0, r1, lsl #2 ; move src pointer down by 4 lines
ldr r12, [sp, #4] ; load thresh pointer
push {lr}
add r1, r1, r1 ; double stride
ldr r12, [sp, #4] ; load thresh
sub r0, r0, r1, lsl #1 ; move src pointer down by 4 lines
vdup.u8 q2, r12 ; thresh
add r12, r0, r1, lsr #1 ; move src pointer up by 1 line
vld1.u8 {q3}, [r0], r1 ; p3
vld1.s8 {d2[], d3[]}, [r3] ; limit
vld1.u8 {q4}, [r0], r1 ; p2
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.u8 {q5}, [r0], r1 ; p1
vld1.u8 {q6}, [r0], r1 ; p0
vld1.u8 {q7}, [r0], r1 ; q0
vld1.u8 {q8}, [r0], r1 ; q1
vld1.u8 {q9}, [r0], r1 ; q2
vld1.u8 {q10}, [r0], r1 ; q3
vld1.u8 {q3}, [r0@128], r1 ; p3
vld1.u8 {q4}, [r12@128], r1 ; p2
vld1.u8 {q5}, [r0@128], r1 ; p1
vld1.u8 {q6}, [r12@128], r1 ; p0
vld1.u8 {q7}, [r0@128], r1 ; q0
vld1.u8 {q8}, [r12@128], r1 ; q1
vld1.u8 {q9}, [r0@128], r1 ; q2
vld1.u8 {q10}, [r12@128], r1 ; q3
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #3
add r0, r0, r1
add r2, r0, r1
add r3, r2, r1
sub r12, r12, r1, lsl #2
add r0, r12, r1, lsr #1
vst1.u8 {q4}, [r0] ; store op2
vst1.u8 {q5}, [r2] ; store op1
vst1.u8 {q6}, [r3], r1 ; store op0
add r12, r3, r1
vst1.u8 {q7}, [r3] ; store oq0
vst1.u8 {q8}, [r12], r1 ; store oq1
vst1.u8 {q9}, [r12] ; store oq2
vst1.u8 {q4}, [r12@128],r1 ; store op2
vst1.u8 {q5}, [r0@128],r1 ; store op1
vst1.u8 {q6}, [r12@128], r1 ; store op0
vst1.u8 {q7}, [r0@128],r1 ; store oq0
vst1.u8 {q8}, [r12@128] ; store oq1
vst1.u8 {q9}, [r0@128] ; store oq2
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_mbloop_filter_horizontal_edge_y_neon|
; void vp8_mbloop_filter_horizontal_edge_uv_neon(unsigned char *u, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
; sp+4 unsigned char *v
|vp8_mbloop_filter_horizontal_edge_uv_neon| PROC
stmdb sp!, {lr}
sub r0, r0, r1, lsl #2 ; move u pointer down by 4 lines
vld1.s8 {d2[], d3[]}, [r3] ; limit
ldr r3, [sp, #8] ; load v ptr
ldr r12, [sp, #4] ; load thresh pointer
sub r3, r3, r1, lsl #2 ; move v pointer down by 4 lines
push {lr}
ldr r12, [sp, #4] ; load thresh
sub r0, r0, r1, lsl #2 ; move u pointer down by 4 lines
vdup.u8 q2, r12 ; thresh
ldr r12, [sp, #8] ; load v ptr
sub r12, r12, r1, lsl #2 ; move v pointer down by 4 lines
vld1.u8 {d6}, [r0], r1 ; p3
vld1.u8 {d7}, [r3], r1 ; p3
vld1.u8 {d8}, [r0], r1 ; p2
vld1.u8 {d9}, [r3], r1 ; p2
vld1.u8 {d10}, [r0], r1 ; p1
vld1.u8 {d11}, [r3], r1 ; p1
vld1.u8 {d12}, [r0], r1 ; p0
vld1.u8 {d13}, [r3], r1 ; p0
vld1.u8 {d14}, [r0], r1 ; q0
vld1.u8 {d15}, [r3], r1 ; q0
vld1.u8 {d16}, [r0], r1 ; q1
vld1.u8 {d17}, [r3], r1 ; q1
vld1.u8 {d18}, [r0], r1 ; q2
vld1.u8 {d19}, [r3], r1 ; q2
vld1.u8 {d20}, [r0], r1 ; q3
vld1.u8 {d21}, [r3], r1 ; q3
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.u8 {d6}, [r0@64], r1 ; p3
vld1.u8 {d7}, [r12@64], r1 ; p3
vld1.u8 {d8}, [r0@64], r1 ; p2
vld1.u8 {d9}, [r12@64], r1 ; p2
vld1.u8 {d10}, [r0@64], r1 ; p1
vld1.u8 {d11}, [r12@64], r1 ; p1
vld1.u8 {d12}, [r0@64], r1 ; p0
vld1.u8 {d13}, [r12@64], r1 ; p0
vld1.u8 {d14}, [r0@64], r1 ; q0
vld1.u8 {d15}, [r12@64], r1 ; q0
vld1.u8 {d16}, [r0@64], r1 ; q1
vld1.u8 {d17}, [r12@64], r1 ; q1
vld1.u8 {d18}, [r0@64], r1 ; q2
vld1.u8 {d19}, [r12@64], r1 ; q2
vld1.u8 {d20}, [r0@64], r1 ; q3
vld1.u8 {d21}, [r12@64], r1 ; q3
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #3
sub r3, r3, r1, lsl #3
sub r12, r12, r1, lsl #3
add r0, r0, r1
add r3, r3, r1
add r12, r12, r1
vst1.u8 {d8}, [r0], r1 ; store u op2
vst1.u8 {d9}, [r3], r1 ; store v op2
vst1.u8 {d10}, [r0], r1 ; store u op1
vst1.u8 {d11}, [r3], r1 ; store v op1
vst1.u8 {d12}, [r0], r1 ; store u op0
vst1.u8 {d13}, [r3], r1 ; store v op0
vst1.u8 {d14}, [r0], r1 ; store u oq0
vst1.u8 {d15}, [r3], r1 ; store v oq0
vst1.u8 {d16}, [r0], r1 ; store u oq1
vst1.u8 {d17}, [r3], r1 ; store v oq1
vst1.u8 {d18}, [r0], r1 ; store u oq2
vst1.u8 {d19}, [r3], r1 ; store v oq2
vst1.u8 {d8}, [r0@64], r1 ; store u op2
vst1.u8 {d9}, [r12@64], r1 ; store v op2
vst1.u8 {d10}, [r0@64], r1 ; store u op1
vst1.u8 {d11}, [r12@64], r1 ; store v op1
vst1.u8 {d12}, [r0@64], r1 ; store u op0
vst1.u8 {d13}, [r12@64], r1 ; store v op0
vst1.u8 {d14}, [r0@64], r1 ; store u oq0
vst1.u8 {d15}, [r12@64], r1 ; store v oq0
vst1.u8 {d16}, [r0@64], r1 ; store u oq1
vst1.u8 {d17}, [r12@64], r1 ; store v oq1
vst1.u8 {d18}, [r0@64], r1 ; store u oq2
vst1.u8 {d19}, [r12@64], r1 ; store v oq2
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_mbloop_filter_horizontal_edge_uv_neon|
; void vp8_mbloop_filter_vertical_edge_y_neon(unsigned char *src, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; int count)
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh)
; r0 unsigned char *src,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 int count (unused)
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_mbloop_filter_vertical_edge_y_neon| PROC
stmdb sp!, {lr}
push {lr}
ldr r12, [sp, #4] ; load thresh
sub r0, r0, #4 ; move src pointer down by 4 columns
vdup.s8 q2, r12 ; thresh
add r12, r0, r1, lsl #3 ; move src pointer down by 8 lines
vld1.u8 {d6}, [r0], r1 ; load first 8-line src data
ldr r12, [sp, #4] ; load thresh pointer
vld1.u8 {d7}, [r12], r1 ; load second 8-line src data
vld1.u8 {d8}, [r0], r1
sub sp, sp, #32
vld1.u8 {d9}, [r12], r1
vld1.u8 {d10}, [r0], r1
vld1.u8 {d11}, [r12], r1
vld1.u8 {d12}, [r0], r1
vld1.u8 {d13}, [r12], r1
vld1.u8 {d14}, [r0], r1
vld1.u8 {d15}, [r12], r1
vld1.u8 {d16}, [r0], r1
vld1.u8 {d17}, [r12], r1
vld1.u8 {d18}, [r0], r1
vld1.u8 {d19}, [r12], r1
vld1.u8 {d20}, [r0], r1
vld1.u8 {d7}, [r0], r1 ; load second 8-line src data
vld1.u8 {d9}, [r0], r1
vld1.u8 {d11}, [r0], r1
vld1.u8 {d13}, [r0], r1
vld1.u8 {d15}, [r0], r1
vld1.u8 {d17}, [r0], r1
vld1.u8 {d19}, [r0], r1
vld1.u8 {d21}, [r0], r1
vld1.u8 {d21}, [r12], r1
;transpose to 8x16 matrix
vtrn.32 q3, q7
@@ -180,133 +168,11 @@
vtrn.8 q7, q8
vtrn.8 q9, q10
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.s8 {d2[], d3[]}, [r3] ; limit
mov r12, sp
vst1.u8 {q3}, [r12]!
vst1.u8 {q10}, [r12]!
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #4
add r2, r0, r1
add r3, r2, r1
vld1.u8 {q3}, [sp]!
vld1.u8 {q10}, [sp]!
;transpose to 16x8 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
add r12, r3, r1
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
;store op2, op1, op0, oq0, oq1, oq2
vst1.8 {d6}, [r0]
vst1.8 {d8}, [r2]
vst1.8 {d10}, [r3]
vst1.8 {d12}, [r12], r1
add r0, r12, r1
vst1.8 {d14}, [r12]
vst1.8 {d16}, [r0], r1
add r2, r0, r1
vst1.8 {d18}, [r0]
vst1.8 {d20}, [r2], r1
add r3, r2, r1
vst1.8 {d7}, [r2]
vst1.8 {d9}, [r3], r1
add r12, r3, r1
vst1.8 {d11}, [r3]
vst1.8 {d13}, [r12], r1
add r0, r12, r1
vst1.8 {d15}, [r12]
vst1.8 {d17}, [r0], r1
add r2, r0, r1
vst1.8 {d19}, [r0]
vst1.8 {d21}, [r2]
ldmia sp!, {pc}
ENDP ; |vp8_mbloop_filter_vertical_edge_y_neon|
; void vp8_mbloop_filter_vertical_edge_uv_neon(unsigned char *u, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 unsigned char *v
|vp8_mbloop_filter_vertical_edge_uv_neon| PROC
stmdb sp!, {lr}
sub r0, r0, #4 ; move src pointer down by 4 columns
vld1.s8 {d2[], d3[]}, [r3] ; limit
ldr r3, [sp, #8] ; load v ptr
ldr r12, [sp, #4] ; load thresh pointer
sub r3, r3, #4 ; move v pointer down by 4 columns
vld1.u8 {d6}, [r0], r1 ;load u data
vld1.u8 {d7}, [r3], r1 ;load v data
vld1.u8 {d8}, [r0], r1
vld1.u8 {d9}, [r3], r1
vld1.u8 {d10}, [r0], r1
vld1.u8 {d11}, [r3], r1
vld1.u8 {d12}, [r0], r1
vld1.u8 {d13}, [r3], r1
vld1.u8 {d14}, [r0], r1
vld1.u8 {d15}, [r3], r1
vld1.u8 {d16}, [r0], r1
vld1.u8 {d17}, [r3], r1
vld1.u8 {d18}, [r0], r1
vld1.u8 {d19}, [r3], r1
vld1.u8 {d20}, [r0], r1
vld1.u8 {d21}, [r3], r1
;transpose to 8x16 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
sub sp, sp, #32
vld1.s8 {d4[], d5[]}, [r12] ; thresh
mov r12, sp
vst1.u8 {q3}, [r12]!
vst1.u8 {q10}, [r12]!
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #3
sub r3, r3, r1, lsl #3
vld1.u8 {q3}, [sp]!
vld1.u8 {q10}, [sp]!
bl vp8_mbloop_filter_neon
sub r12, r12, r1, lsl #3
;transpose to 16x8 matrix
vtrn.32 q3, q7
@@ -326,23 +192,118 @@
;store op2, op1, op0, oq0, oq1, oq2
vst1.8 {d6}, [r0], r1
vst1.8 {d7}, [r3], r1
vst1.8 {d7}, [r12], r1
vst1.8 {d8}, [r0], r1
vst1.8 {d9}, [r3], r1
vst1.8 {d9}, [r12], r1
vst1.8 {d10}, [r0], r1
vst1.8 {d11}, [r3], r1
vst1.8 {d11}, [r12], r1
vst1.8 {d12}, [r0], r1
vst1.8 {d13}, [r3], r1
vst1.8 {d13}, [r12], r1
vst1.8 {d14}, [r0], r1
vst1.8 {d15}, [r3], r1
vst1.8 {d15}, [r12], r1
vst1.8 {d16}, [r0], r1
vst1.8 {d17}, [r3], r1
vst1.8 {d17}, [r12], r1
vst1.8 {d18}, [r0], r1
vst1.8 {d19}, [r3], r1
vst1.8 {d20}, [r0], r1
vst1.8 {d21}, [r3], r1
vst1.8 {d19}, [r12], r1
vst1.8 {d20}, [r0]
vst1.8 {d21}, [r12]
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_mbloop_filter_vertical_edge_y_neon|
; void vp8_mbloop_filter_vertical_edge_uv_neon(unsigned char *u, int pitch,
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 unsigned char *v
|vp8_mbloop_filter_vertical_edge_uv_neon| PROC
push {lr}
ldr r12, [sp, #4] ; load thresh
sub r0, r0, #4 ; move u pointer down by 4 columns
vdup.u8 q2, r12 ; thresh
ldr r12, [sp, #8] ; load v ptr
sub r12, r12, #4 ; move v pointer down by 4 columns
vld1.u8 {d6}, [r0], r1 ;load u data
vld1.u8 {d7}, [r12], r1 ;load v data
vld1.u8 {d8}, [r0], r1
vld1.u8 {d9}, [r12], r1
vld1.u8 {d10}, [r0], r1
vld1.u8 {d11}, [r12], r1
vld1.u8 {d12}, [r0], r1
vld1.u8 {d13}, [r12], r1
vld1.u8 {d14}, [r0], r1
vld1.u8 {d15}, [r12], r1
vld1.u8 {d16}, [r0], r1
vld1.u8 {d17}, [r12], r1
vld1.u8 {d18}, [r0], r1
vld1.u8 {d19}, [r12], r1
vld1.u8 {d20}, [r0], r1
vld1.u8 {d21}, [r12], r1
;transpose to 8x16 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
sub r0, r0, r1, lsl #3
bl vp8_mbloop_filter_neon
sub r12, r12, r1, lsl #3
;transpose to 16x8 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
;store op2, op1, op0, oq0, oq1, oq2
vst1.8 {d6}, [r0], r1
vst1.8 {d7}, [r12], r1
vst1.8 {d8}, [r0], r1
vst1.8 {d9}, [r12], r1
vst1.8 {d10}, [r0], r1
vst1.8 {d11}, [r12], r1
vst1.8 {d12}, [r0], r1
vst1.8 {d13}, [r12], r1
vst1.8 {d14}, [r0], r1
vst1.8 {d15}, [r12], r1
vst1.8 {d16}, [r0], r1
vst1.8 {d17}, [r12], r1
vst1.8 {d18}, [r0], r1
vst1.8 {d19}, [r12], r1
vst1.8 {d20}, [r0]
vst1.8 {d21}, [r12]
pop {pc}
ENDP ; |vp8_mbloop_filter_vertical_edge_uv_neon|
; void vp8_mbloop_filter_neon()
@@ -350,41 +311,33 @@
; functions do the necessary load, transpose (if necessary), preserve (if
; necessary) and store.
; TODO:
; The vertical filter writes p3/q3 back out because two 4 element writes are
; much simpler than ordering and writing two 3 element sets (or three 2 elements
; sets, or whichever other combinations are possible).
; If we can preserve q3 and q10, the vertical filter will be able to avoid
; storing those values on the stack and reading them back after the filter.
; r0,r1 PRESERVE
; r2 flimit
; r3 PRESERVE
; q1 limit
; r2 mblimit
; r3 limit
; q2 thresh
; q3 p3
; q3 p3 PRESERVE
; q4 p2
; q5 p1
; q6 p0
; q7 q0
; q8 q1
; q9 q2
; q10 q3
; q10 q3 PRESERVE
|vp8_mbloop_filter_neon| PROC
ldr r12, _mblf_coeff_
; vp8_filter_mask
vabd.u8 q11, q3, q4 ; abs(p3 - p2)
vabd.u8 q12, q4, q5 ; abs(p2 - p1)
vabd.u8 q13, q5, q6 ; abs(p1 - p0)
vabd.u8 q14, q8, q7 ; abs(q1 - q0)
vabd.u8 q3, q9, q8 ; abs(q2 - q1)
vabd.u8 q1, q9, q8 ; abs(q2 - q1)
vabd.u8 q0, q10, q9 ; abs(q3 - q2)
vmax.u8 q11, q11, q12
vmax.u8 q12, q13, q14
vmax.u8 q3, q3, q0
vmax.u8 q1, q1, q0
vmax.u8 q15, q11, q12
vabd.u8 q12, q6, q7 ; abs(p0 - q0)
@@ -392,51 +345,53 @@
; vp8_hevmask
vcgt.u8 q13, q13, q2 ; (abs(p1 - p0) > thresh) * -1
vcgt.u8 q14, q14, q2 ; (abs(q1 - q0) > thresh) * -1
vmax.u8 q15, q15, q3
vmax.u8 q15, q15, q1
vld1.s8 {d4[], d5[]}, [r2] ; flimit
vdup.u8 q1, r3 ; limit
vdup.u8 q2, r2 ; mblimit
vld1.u8 {q0}, [r12]!
vmov.u8 q0, #0x80 ; 0x80
vadd.u8 q2, q2, q2 ; flimit * 2
vadd.u8 q2, q2, q1 ; flimit * 2 + limit
vcge.u8 q15, q1, q15
vabd.u8 q1, q5, q8 ; a = abs(p1 - q1)
vqadd.u8 q12, q12, q12 ; b = abs(p0 - q0) * 2
vshr.u8 q1, q1, #1 ; a = a / 2
vqadd.u8 q12, q12, q1 ; a = b + a
vcge.u8 q12, q2, q12 ; (a > flimit * 2 + limit) * -1
vmov.u16 q11, #3 ; #3
; vp8_filter
; convert to signed
veor q7, q7, q0 ; qs0
vshr.u8 q1, q1, #1 ; a = a / 2
veor q6, q6, q0 ; ps0
veor q5, q5, q0 ; ps1
vqadd.u8 q12, q12, q1 ; a = b + a
veor q8, q8, q0 ; qs1
veor q4, q4, q0 ; ps2
veor q9, q9, q0 ; qs2
vorr q14, q13, q14 ; vp8_hevmask
vcge.u8 q12, q2, q12 ; (a > flimit * 2 + limit) * -1
vsubl.s8 q2, d14, d12 ; qs0 - ps0
vsubl.s8 q13, d15, d13
vqsub.s8 q1, q5, q8 ; vp8_filter = clamp(ps1-qs1)
vadd.s16 q10, q2, q2 ; 3 * (qs0 - ps0)
vadd.s16 q11, q13, q13
vmul.i16 q2, q2, q11 ; 3 * ( qs0 - ps0)
vand q15, q15, q12 ; vp8_filter_mask
vadd.s16 q2, q2, q10
vadd.s16 q13, q13, q11
vmul.i16 q13, q13, q11
vld1.u8 {q12}, [r12]! ; #3
vmov.u8 q12, #3 ; #3
vaddw.s8 q2, q2, d2 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q13, q13, d3
vld1.u8 {q11}, [r12]! ; #4
vmov.u8 q11, #4 ; #4
; vp8_filter = clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d2, q2
@@ -444,27 +399,23 @@
vand q1, q1, q15 ; vp8_filter &= mask
vld1.u8 {q15}, [r12]! ; #63
;
vand q13, q1, q14 ; Filter2 &= hev
vmov.u16 q15, #63 ; #63
vld1.u8 {d7}, [r12]! ; #9
vand q13, q1, q14 ; Filter2 &= hev
vqadd.s8 q2, q13, q11 ; Filter1 = clamp(Filter2+4)
vqadd.s8 q13, q13, q12 ; Filter2 = clamp(Filter2+3)
vld1.u8 {d6}, [r12]! ; #18
vmov q0, q15
vshr.s8 q2, q2, #3 ; Filter1 >>= 3
vshr.s8 q13, q13, #3 ; Filter2 >>= 3
vmov q10, q15
vmov q11, q15
vmov q12, q15
vqsub.s8 q7, q7, q2 ; qs0 = clamp(qs0 - Filter1)
vld1.u8 {d5}, [r12]! ; #27
vqadd.s8 q6, q6, q13 ; ps0 = clamp(ps0 + Filter2)
vbic q1, q1, q14 ; vp8_filter &= ~hev
@@ -472,49 +423,47 @@
; roughly 1/7th difference across boundary
; roughly 2/7th difference across boundary
; roughly 3/7th difference across boundary
vmov q11, q15
vmov.u8 d5, #9 ; #9
vmov.u8 d4, #18 ; #18
vmov q13, q15
vmov q14, q15
vmlal.s8 q10, d2, d7 ; Filter2 * 9
vmlal.s8 q11, d3, d7
vmlal.s8 q12, d2, d6 ; Filter2 * 18
vmlal.s8 q13, d3, d6
vmlal.s8 q14, d2, d5 ; Filter2 * 27
vmlal.s8 q0, d2, d5 ; 63 + Filter2 * 9
vmlal.s8 q11, d3, d5
vmov.u8 d5, #27 ; #27
vmlal.s8 q12, d2, d4 ; 63 + Filter2 * 18
vmlal.s8 q13, d3, d4
vmlal.s8 q14, d2, d5 ; 63 + Filter2 * 27
vmlal.s8 q15, d3, d5
vqshrn.s16 d20, q10, #7 ; u = clamp((63 + Filter2 * 9)>>7)
vqshrn.s16 d21, q11, #7
vqshrn.s16 d0, q0, #7 ; u = clamp((63 + Filter2 * 9)>>7)
vqshrn.s16 d1, q11, #7
vqshrn.s16 d24, q12, #7 ; u = clamp((63 + Filter2 * 18)>>7)
vqshrn.s16 d25, q13, #7
vqshrn.s16 d28, q14, #7 ; u = clamp((63 + Filter2 * 27)>>7)
vqshrn.s16 d29, q15, #7
vqsub.s8 q11, q9, q10 ; s = clamp(qs2 - u)
vqadd.s8 q10, q4, q10 ; s = clamp(ps2 + u)
vmov.u8 q1, #0x80 ; 0x80
vqsub.s8 q11, q9, q0 ; s = clamp(qs2 - u)
vqadd.s8 q0, q4, q0 ; s = clamp(ps2 + u)
vqsub.s8 q13, q8, q12 ; s = clamp(qs1 - u)
vqadd.s8 q12, q5, q12 ; s = clamp(ps1 + u)
vqsub.s8 q15, q7, q14 ; s = clamp(qs0 - u)
vqadd.s8 q14, q6, q14 ; s = clamp(ps0 + u)
veor q9, q11, q0 ; *oq2 = s^0x80
veor q4, q10, q0 ; *op2 = s^0x80
veor q8, q13, q0 ; *oq1 = s^0x80
veor q5, q12, q0 ; *op2 = s^0x80
veor q7, q15, q0 ; *oq0 = s^0x80
veor q6, q14, q0 ; *op0 = s^0x80
veor q9, q11, q1 ; *oq2 = s^0x80
veor q4, q0, q1 ; *op2 = s^0x80
veor q8, q13, q1 ; *oq1 = s^0x80
veor q5, q12, q1 ; *op2 = s^0x80
veor q7, q15, q1 ; *oq0 = s^0x80
veor q6, q14, q1 ; *op0 = s^0x80
bx lr
ENDP ; |vp8_mbloop_filter_neon|
;-----------------
_mblf_coeff_
DCD mblf_coeff
mblf_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
DCD 0x003f003f, 0x003f003f, 0x003f003f, 0x003f003f
DCD 0x09090909, 0x09090909, 0x12121212, 0x12121212
DCD 0x1b1b1b1b, 0x1b1b1b1b
END

View File

@@ -31,7 +31,7 @@
;result of the multiplication that is needed in IDCT.
|vp8_short_idct4x4llm_neon| PROC
ldr r12, _idct_coeff_
adr r12, idct_coeff
vld1.16 {q1, q2}, [r0]
vld1.16 {d0}, [r12]
@@ -114,8 +114,6 @@
;-----------------
_idct_coeff_
DCD idct_coeff
idct_coeff
DCD 0x4e7b4e7b, 0x8a8c8a8c

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter16_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -33,7 +44,7 @@
|vp8_sixtap_predict16x16_neon| PROC
push {r4-r5, lr}
ldr r12, _filter16_coeff_
adr r12, filter16_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -476,17 +487,4 @@ secondpass_only_inner_loop_neon
ENDP
;-----------------
_filter16_coeff_
DCD filter16_coeff
filter16_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter4_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -25,7 +36,7 @@
|vp8_sixtap_predict_neon| PROC
push {r4, lr}
ldr r12, _filter4_coeff_
adr r12, filter4_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -408,16 +419,4 @@ secondpass_filter4x4_only
;-----------------
_filter4_coeff_
DCD filter4_coeff
filter4_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -25,7 +36,7 @@
|vp8_sixtap_predict8x4_neon| PROC
push {r4-r5, lr}
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -459,16 +470,4 @@ secondpass_filter8x4_only
;-----------------
_filter8_coeff_
DCD filter8_coeff
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -25,7 +36,7 @@
|vp8_sixtap_predict8x8_neon| PROC
push {r4-r5, lr}
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -510,16 +521,4 @@ filt_blk2d_spo8x8_loop_neon
;-----------------
_filter8_coeff_
DCD filter8_coeff
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -9,27 +9,14 @@
*/
#include "vpx_ports/config.h"
#include <stddef.h>
#include "vpx_config.h"
#include "vpx/vpx_codec.h"
#include "vpx_ports/asm_offsets.h"
#include "vpx_scale/yv12config.h"
#define ct_assert(name,cond) \
static void assert_##name(void) UNUSED;\
static void assert_##name(void) {switch(0){case 0:case !!(cond):;}}
BEGIN
#define DEFINE(sym, val) int sym = val;
/*
#define BLANK() asm volatile("\n->" : : )
*/
/*
* int main(void)
* {
*/
//vpx_scale
/* vpx_scale */
DEFINE(yv12_buffer_config_y_width, offsetof(YV12_BUFFER_CONFIG, y_width));
DEFINE(yv12_buffer_config_y_height, offsetof(YV12_BUFFER_CONFIG, y_height));
DEFINE(yv12_buffer_config_y_stride, offsetof(YV12_BUFFER_CONFIG, y_stride));
@@ -40,10 +27,14 @@ DEFINE(yv12_buffer_config_y_buffer, offsetof(YV12_BUFFER_CONFIG, y_b
DEFINE(yv12_buffer_config_u_buffer, offsetof(YV12_BUFFER_CONFIG, u_buffer));
DEFINE(yv12_buffer_config_v_buffer, offsetof(YV12_BUFFER_CONFIG, v_buffer));
DEFINE(yv12_buffer_config_border, offsetof(YV12_BUFFER_CONFIG, border));
DEFINE(VP8BORDERINPIXELS_VAL, VP8BORDERINPIXELS);
//add asserts for any offset that is not supported by assembly code
//add asserts for any size that is not supported by assembly code
/*
* return 0;
* }
*/
END
/* add asserts for any offset that is not supported by assembly code */
/* add asserts for any size that is not supported by assembly code */
#if HAVE_ARMV7
/* vp8_yv12_extend_frame_borders_neon makes several assumptions based on this */
ct_assert(VP8BORDERINPIXELS_VAL, VP8BORDERINPIXELS == 32)
#endif

View File

@@ -14,17 +14,12 @@
void vpx_log(const char *format, ...);
#include "../../vpx_ports/config.h"
#include "../../vpx_scale/yv12config.h"
#include "vpx_ports/config.h"
#include "vpx_scale/yv12config.h"
#include "mv.h"
#include "treecoder.h"
#include "subpixel.h"
#include "../../vpx_ports/mem.h"
#include "../../vpx_config.h"
#if CONFIG_OPENCL
#include "opencl/vp8_opencl.h"
#endif
#include "vpx_ports/mem.h"
#define TRUE 1
#define FALSE 0
@@ -78,19 +73,19 @@ typedef enum
typedef enum
{
DC_PRED = 0, /* average of above and left pixels */
V_PRED = 1, /* vertical prediction */
H_PRED = 2, /* horizontal prediction */
TM_PRED = 3, /* Truemotion prediction */
B_PRED = 4, /* block based prediction, each block has its own prediction mode */
DC_PRED, /* average of above and left pixels */
V_PRED, /* vertical prediction */
H_PRED, /* horizontal prediction */
TM_PRED, /* Truemotion prediction */
B_PRED, /* block based prediction, each block has its own prediction mode */
NEARESTMV = 5,
NEARMV = 6,
ZEROMV = 7,
NEWMV = 8,
SPLITMV = 9,
NEARESTMV,
NEARMV,
ZEROMV,
NEWMV,
SPLITMV,
MB_MODE_COUNT = 10
MB_MODE_COUNT
} MB_PREDICTION_MODE;
/* Macroblock level features */
@@ -142,16 +137,11 @@ typedef enum
modes for the Y blocks to the left and above us; for interframes, there
is a single probability table. */
typedef struct
union b_mode_info
{
B_PREDICTION_MODE mode;
union
{
int as_int;
MV as_mv;
} mv;
} B_MODE_INFO;
B_PREDICTION_MODE as_mode;
int_mv mv;
};
typedef enum
{
@@ -166,79 +156,43 @@ typedef struct
{
MB_PREDICTION_MODE mode, uv_mode;
MV_REFERENCE_FRAME ref_frame;
union
{
int as_int;
MV as_mv;
} mv;
int_mv mv;
unsigned char partitioning;
unsigned char mb_skip_coeff; /* does this mb has coefficients at all, 1=no coefficients, 0=need decode tokens */
unsigned char dc_diff;
unsigned char need_to_clamp_mvs;
unsigned char segment_id; /* Which set of segmentation parameters should be used for this MB */
unsigned char force_no_skip; /* encoder only */
} MB_MODE_INFO;
typedef struct
{
MB_MODE_INFO mbmi;
B_MODE_INFO bmi[16];
union b_mode_info bmi[16];
} MODE_INFO;
typedef struct
{
short *qcoeff_base;
int qcoeff_offset;
short *dqcoeff_base;
int dqcoeff_offset;
unsigned char *predictor_base;
int predictor_offset;
short *diff_base;
int diff_offset;
short *qcoeff;
short *dqcoeff;
unsigned char *predictor;
short *diff;
short *dequant;
#if CONFIG_OPENCL
cl_command_queue cl_commands; //pointer to macroblock CL command queue
cl_mem cl_diff_mem;
cl_mem cl_predictor_mem;
cl_mem cl_qcoeff_mem;
cl_mem cl_dqcoeff_mem;
cl_mem cl_eobs_mem;
cl_mem cl_dequant_mem; //Block-specific, not shared
cl_bool sixtap_filter; //Subpixel Prediction type (true=sixtap, false=bilinear)
#endif
/* 16 Y blocks, 4 U blocks, 4 V blocks each with 16 entries */
unsigned char **base_pre; //previous frame, same Macroblock, base pointer
unsigned char **base_pre;
int pre;
int pre_stride;
unsigned char **base_dst; //destination base pointer
unsigned char **base_dst;
int dst;
int dst_stride;
int eob; //only used in encoder? Decoder uses MBD.eobs
char *eobs_base; //beginning of MB.eobs
B_MODE_INFO bmi;
int eob;
union b_mode_info bmi;
} BLOCKD;
typedef struct
typedef struct MacroBlockD
{
DECLARE_ALIGNED(16, short, diff[400]); /* from idct diff */
DECLARE_ALIGNED(16, unsigned char, predictor[384]);
@@ -246,22 +200,11 @@ typedef struct
DECLARE_ALIGNED(16, short, dqcoeff[400]);
DECLARE_ALIGNED(16, char, eobs[25]);
#if CONFIG_OPENCL
cl_command_queue cl_commands; //Each macroblock gets its own command queue.
cl_mem cl_diff_mem;
cl_mem cl_predictor_mem;
cl_mem cl_qcoeff_mem;
cl_mem cl_dqcoeff_mem;
cl_mem cl_eobs_mem;
cl_bool sixtap_filter;
#endif
/* 16 Y blocks, 4 U, 4 V, 1 DC 2nd order block, each with 16 entries. */
BLOCKD block[25];
YV12_BUFFER_CONFIG pre; /* Filtered copy of previous frame reconstruction */
YV12_BUFFER_CONFIG dst; /* Destination buffer for current frame */
YV12_BUFFER_CONFIG dst;
MODE_INFO *mode_info_context;
int mode_info_stride;
@@ -309,9 +252,11 @@ typedef struct
int mb_to_top_edge;
int mb_to_bottom_edge;
int ref_frame_cost[MAX_REF_FRAMES];
unsigned int frames_since_golden;
unsigned int frames_till_alt_ref_frame;
vp8_subpix_fn_t subpixel_predict;
vp8_subpix_fn_t subpixel_predict8x4;
vp8_subpix_fn_t subpixel_predict8x8;
@@ -321,6 +266,14 @@ typedef struct
int corrupted;
#if ARCH_X86 || ARCH_X86_64
/* This is an intermediate buffer currently used in sub-pixel motion search
* to keep a copy of the reference area. This buffer can be used for other
* purpose.
*/
DECLARE_ALIGNED(32, unsigned char, y_buf[22*32]);
#endif
#if CONFIG_RUNTIME_CPU_DETECT
struct VP8_COMMON_RTCD *rtcd;
#endif
@@ -330,4 +283,20 @@ typedef struct
extern void vp8_build_block_doffsets(MACROBLOCKD *x);
extern void vp8_setup_block_dptrs(MACROBLOCKD *x);
static void update_blockd_bmi(MACROBLOCKD *xd)
{
int i;
int is_4x4;
is_4x4 = (xd->mode_info_context->mbmi.mode == SPLITMV) ||
(xd->mode_info_context->mbmi.mode == B_PRED);
if (is_4x4)
{
for (i = 0; i < 16; i++)
{
xd->block[i].bmi = xd->mode_info_context->bmi[i];
}
}
}
#endif /* __INC_BLOCKD_H */

View File

@@ -12,7 +12,7 @@
/* Update probabilities for the nodes in the token entropy tree.
Generated file included by entropy.c */
const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens-1] =
const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [ENTROPY_NODES] =
{
{
{

View File

@@ -97,7 +97,7 @@ void vp8_print_modes_and_motion_vectors(MODE_INFO *mi, int rows, int cols, int f
bindex = (b_row & 3) * 4 + (b_col & 3);
if (mi[mb_index].mbmi.mode == B_PRED)
fprintf(mvs, "%2d ", mi[mb_index].bmi[bindex].mode);
fprintf(mvs, "%2d ", mi[mb_index].bmi[bindex].as_mode);
else
fprintf(mvs, "xx ");

View File

@@ -0,0 +1,225 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "defaultcoefcounts.h"
/* Generated file, included by entropy.c */
const unsigned int vp8_default_coef_counts[BLOCK_TYPES]
[COEF_BANDS]
[PREV_COEF_CONTEXTS]
[MAX_ENTROPY_TOKENS] =
{
{
/* Block Type ( 0 ) */
{
/* Coeff Band ( 0 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
{
/* Coeff Band ( 1 ) */
{30190, 26544, 225, 24, 4, 0, 0, 0, 0, 0, 0, 4171593,},
{26846, 25157, 1241, 130, 26, 6, 1, 0, 0, 0, 0, 149987,},
{10484, 9538, 1006, 160, 36, 18, 0, 0, 0, 0, 0, 15104,},
},
{
/* Coeff Band ( 2 ) */
{25842, 40456, 1126, 83, 11, 2, 0, 0, 0, 0, 0, 0,},
{9338, 8010, 512, 73, 7, 3, 2, 0, 0, 0, 0, 43294,},
{1047, 751, 149, 31, 13, 6, 1, 0, 0, 0, 0, 879,},
},
{
/* Coeff Band ( 3 ) */
{26136, 9826, 252, 13, 0, 0, 0, 0, 0, 0, 0, 0,},
{8134, 5574, 191, 14, 2, 0, 0, 0, 0, 0, 0, 35302,},
{ 605, 677, 116, 9, 1, 0, 0, 0, 0, 0, 0, 611,},
},
{
/* Coeff Band ( 4 ) */
{10263, 15463, 283, 17, 0, 0, 0, 0, 0, 0, 0, 0,},
{2773, 2191, 128, 9, 2, 2, 0, 0, 0, 0, 0, 10073,},
{ 134, 125, 32, 4, 0, 2, 0, 0, 0, 0, 0, 50,},
},
{
/* Coeff Band ( 5 ) */
{10483, 2663, 23, 1, 0, 0, 0, 0, 0, 0, 0, 0,},
{2137, 1251, 27, 1, 1, 0, 0, 0, 0, 0, 0, 14362,},
{ 116, 156, 14, 2, 1, 0, 0, 0, 0, 0, 0, 190,},
},
{
/* Coeff Band ( 6 ) */
{40977, 27614, 412, 28, 0, 0, 0, 0, 0, 0, 0, 0,},
{6113, 5213, 261, 22, 3, 0, 0, 0, 0, 0, 0, 26164,},
{ 382, 312, 50, 14, 2, 0, 0, 0, 0, 0, 0, 345,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 319,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8,},
},
},
{
/* Block Type ( 1 ) */
{
/* Coeff Band ( 0 ) */
{3268, 19382, 1043, 250, 93, 82, 49, 26, 17, 8, 25, 82289,},
{8758, 32110, 5436, 1832, 827, 668, 420, 153, 24, 0, 3, 52914,},
{9337, 23725, 8487, 3954, 2107, 1836, 1069, 399, 59, 0, 0, 18620,},
},
{
/* Coeff Band ( 1 ) */
{12419, 8420, 452, 62, 9, 1, 0, 0, 0, 0, 0, 0,},
{11715, 8705, 693, 92, 15, 7, 2, 0, 0, 0, 0, 53988,},
{7603, 8585, 2306, 778, 270, 145, 39, 5, 0, 0, 0, 9136,},
},
{
/* Coeff Band ( 2 ) */
{15938, 14335, 1207, 184, 55, 13, 4, 1, 0, 0, 0, 0,},
{7415, 6829, 1138, 244, 71, 26, 7, 0, 0, 0, 0, 9980,},
{1580, 1824, 655, 241, 89, 46, 10, 2, 0, 0, 0, 429,},
},
{
/* Coeff Band ( 3 ) */
{19453, 5260, 201, 19, 0, 0, 0, 0, 0, 0, 0, 0,},
{9173, 3758, 213, 22, 1, 1, 0, 0, 0, 0, 0, 9820,},
{1689, 1277, 276, 51, 17, 4, 0, 0, 0, 0, 0, 679,},
},
{
/* Coeff Band ( 4 ) */
{12076, 10667, 620, 85, 19, 9, 5, 0, 0, 0, 0, 0,},
{4665, 3625, 423, 55, 19, 9, 0, 0, 0, 0, 0, 5127,},
{ 415, 440, 143, 34, 20, 7, 2, 0, 0, 0, 0, 101,},
},
{
/* Coeff Band ( 5 ) */
{12183, 4846, 115, 11, 1, 0, 0, 0, 0, 0, 0, 0,},
{4226, 3149, 177, 21, 2, 0, 0, 0, 0, 0, 0, 7157,},
{ 375, 621, 189, 51, 11, 4, 1, 0, 0, 0, 0, 198,},
},
{
/* Coeff Band ( 6 ) */
{61658, 37743, 1203, 94, 10, 3, 0, 0, 0, 0, 0, 0,},
{15514, 11563, 903, 111, 14, 5, 0, 0, 0, 0, 0, 25195,},
{ 929, 1077, 291, 78, 14, 7, 1, 0, 0, 0, 0, 507,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 990, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 412, 13, 0, 0, 0, 0, 0, 0, 0, 0, 1641,},
{ 0, 18, 7, 1, 0, 0, 0, 0, 0, 0, 0, 30,},
},
},
{
/* Block Type ( 2 ) */
{
/* Coeff Band ( 0 ) */
{ 953, 24519, 628, 120, 28, 12, 4, 0, 0, 0, 0, 2248798,},
{1525, 25654, 2647, 617, 239, 143, 42, 5, 0, 0, 0, 66837,},
{1180, 11011, 3001, 1237, 532, 448, 239, 54, 5, 0, 0, 7122,},
},
{
/* Coeff Band ( 1 ) */
{1356, 2220, 67, 10, 4, 1, 0, 0, 0, 0, 0, 0,},
{1450, 2544, 102, 18, 4, 3, 0, 0, 0, 0, 0, 57063,},
{1182, 2110, 470, 130, 41, 21, 0, 0, 0, 0, 0, 6047,},
},
{
/* Coeff Band ( 2 ) */
{ 370, 3378, 200, 30, 5, 4, 1, 0, 0, 0, 0, 0,},
{ 293, 1006, 131, 29, 11, 0, 0, 0, 0, 0, 0, 5404,},
{ 114, 387, 98, 23, 4, 8, 1, 0, 0, 0, 0, 236,},
},
{
/* Coeff Band ( 3 ) */
{ 579, 194, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 395, 213, 5, 1, 0, 0, 0, 0, 0, 0, 0, 4157,},
{ 119, 122, 4, 0, 0, 0, 0, 0, 0, 0, 0, 300,},
},
{
/* Coeff Band ( 4 ) */
{ 38, 557, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 21, 114, 12, 1, 0, 0, 0, 0, 0, 0, 0, 427,},
{ 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7,},
},
{
/* Coeff Band ( 5 ) */
{ 52, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 18, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 652,},
{ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30,},
},
{
/* Coeff Band ( 6 ) */
{ 640, 569, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 25, 77, 2, 0, 0, 0, 0, 0, 0, 0, 0, 517,},
{ 4, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
},
{
/* Block Type ( 3 ) */
{
/* Coeff Band ( 0 ) */
{2506, 20161, 2707, 767, 261, 178, 107, 30, 14, 3, 0, 100694,},
{8806, 36478, 8817, 3268, 1280, 850, 401, 114, 42, 0, 0, 58572,},
{11003, 27214, 11798, 5716, 2482, 2072, 1048, 175, 32, 0, 0, 19284,},
},
{
/* Coeff Band ( 1 ) */
{9738, 11313, 959, 205, 70, 18, 11, 1, 0, 0, 0, 0,},
{12628, 15085, 1507, 273, 52, 19, 9, 0, 0, 0, 0, 54280,},
{10701, 15846, 5561, 1926, 813, 570, 249, 36, 0, 0, 0, 6460,},
},
{
/* Coeff Band ( 2 ) */
{6781, 22539, 2784, 634, 182, 123, 20, 4, 0, 0, 0, 0,},
{6263, 11544, 2649, 790, 259, 168, 27, 5, 0, 0, 0, 20539,},
{3109, 4075, 2031, 896, 457, 386, 158, 29, 0, 0, 0, 1138,},
},
{
/* Coeff Band ( 3 ) */
{11515, 4079, 465, 73, 5, 14, 2, 0, 0, 0, 0, 0,},
{9361, 5834, 650, 96, 24, 8, 4, 0, 0, 0, 0, 22181,},
{4343, 3974, 1360, 415, 132, 96, 14, 1, 0, 0, 0, 1267,},
},
{
/* Coeff Band ( 4 ) */
{4787, 9297, 823, 168, 44, 12, 4, 0, 0, 0, 0, 0,},
{3619, 4472, 719, 198, 60, 31, 3, 0, 0, 0, 0, 8401,},
{1157, 1175, 483, 182, 88, 31, 8, 0, 0, 0, 0, 268,},
},
{
/* Coeff Band ( 5 ) */
{8299, 1226, 32, 5, 1, 0, 0, 0, 0, 0, 0, 0,},
{3502, 1568, 57, 4, 1, 1, 0, 0, 0, 0, 0, 9811,},
{1055, 1070, 166, 29, 6, 1, 0, 0, 0, 0, 0, 527,},
},
{
/* Coeff Band ( 6 ) */
{27414, 27927, 1989, 347, 69, 26, 0, 0, 0, 0, 0, 0,},
{5876, 10074, 1574, 341, 91, 24, 4, 0, 0, 0, 0, 21954,},
{1571, 2171, 778, 324, 124, 65, 16, 0, 0, 0, 0, 979,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 459,},
{ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13,},
},
},
};

View File

@@ -8,214 +8,14 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef __DEFAULTCOEFCOUNTS_H
#define __DEFAULTCOEFCOUNTS_H
/* Generated file, included by entropy.c */
#include "entropy.h"
static const unsigned int default_coef_counts [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens] =
{
extern const unsigned int vp8_default_coef_counts[BLOCK_TYPES]
[COEF_BANDS]
[PREV_COEF_CONTEXTS]
[MAX_ENTROPY_TOKENS];
{
/* Block Type ( 0 ) */
{
/* Coeff Band ( 0 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
{
/* Coeff Band ( 1 ) */
{30190, 26544, 225, 24, 4, 0, 0, 0, 0, 0, 0, 4171593,},
{26846, 25157, 1241, 130, 26, 6, 1, 0, 0, 0, 0, 149987,},
{10484, 9538, 1006, 160, 36, 18, 0, 0, 0, 0, 0, 15104,},
},
{
/* Coeff Band ( 2 ) */
{25842, 40456, 1126, 83, 11, 2, 0, 0, 0, 0, 0, 0,},
{9338, 8010, 512, 73, 7, 3, 2, 0, 0, 0, 0, 43294,},
{1047, 751, 149, 31, 13, 6, 1, 0, 0, 0, 0, 879,},
},
{
/* Coeff Band ( 3 ) */
{26136, 9826, 252, 13, 0, 0, 0, 0, 0, 0, 0, 0,},
{8134, 5574, 191, 14, 2, 0, 0, 0, 0, 0, 0, 35302,},
{ 605, 677, 116, 9, 1, 0, 0, 0, 0, 0, 0, 611,},
},
{
/* Coeff Band ( 4 ) */
{10263, 15463, 283, 17, 0, 0, 0, 0, 0, 0, 0, 0,},
{2773, 2191, 128, 9, 2, 2, 0, 0, 0, 0, 0, 10073,},
{ 134, 125, 32, 4, 0, 2, 0, 0, 0, 0, 0, 50,},
},
{
/* Coeff Band ( 5 ) */
{10483, 2663, 23, 1, 0, 0, 0, 0, 0, 0, 0, 0,},
{2137, 1251, 27, 1, 1, 0, 0, 0, 0, 0, 0, 14362,},
{ 116, 156, 14, 2, 1, 0, 0, 0, 0, 0, 0, 190,},
},
{
/* Coeff Band ( 6 ) */
{40977, 27614, 412, 28, 0, 0, 0, 0, 0, 0, 0, 0,},
{6113, 5213, 261, 22, 3, 0, 0, 0, 0, 0, 0, 26164,},
{ 382, 312, 50, 14, 2, 0, 0, 0, 0, 0, 0, 345,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 319,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8,},
},
},
{
/* Block Type ( 1 ) */
{
/* Coeff Band ( 0 ) */
{3268, 19382, 1043, 250, 93, 82, 49, 26, 17, 8, 25, 82289,},
{8758, 32110, 5436, 1832, 827, 668, 420, 153, 24, 0, 3, 52914,},
{9337, 23725, 8487, 3954, 2107, 1836, 1069, 399, 59, 0, 0, 18620,},
},
{
/* Coeff Band ( 1 ) */
{12419, 8420, 452, 62, 9, 1, 0, 0, 0, 0, 0, 0,},
{11715, 8705, 693, 92, 15, 7, 2, 0, 0, 0, 0, 53988,},
{7603, 8585, 2306, 778, 270, 145, 39, 5, 0, 0, 0, 9136,},
},
{
/* Coeff Band ( 2 ) */
{15938, 14335, 1207, 184, 55, 13, 4, 1, 0, 0, 0, 0,},
{7415, 6829, 1138, 244, 71, 26, 7, 0, 0, 0, 0, 9980,},
{1580, 1824, 655, 241, 89, 46, 10, 2, 0, 0, 0, 429,},
},
{
/* Coeff Band ( 3 ) */
{19453, 5260, 201, 19, 0, 0, 0, 0, 0, 0, 0, 0,},
{9173, 3758, 213, 22, 1, 1, 0, 0, 0, 0, 0, 9820,},
{1689, 1277, 276, 51, 17, 4, 0, 0, 0, 0, 0, 679,},
},
{
/* Coeff Band ( 4 ) */
{12076, 10667, 620, 85, 19, 9, 5, 0, 0, 0, 0, 0,},
{4665, 3625, 423, 55, 19, 9, 0, 0, 0, 0, 0, 5127,},
{ 415, 440, 143, 34, 20, 7, 2, 0, 0, 0, 0, 101,},
},
{
/* Coeff Band ( 5 ) */
{12183, 4846, 115, 11, 1, 0, 0, 0, 0, 0, 0, 0,},
{4226, 3149, 177, 21, 2, 0, 0, 0, 0, 0, 0, 7157,},
{ 375, 621, 189, 51, 11, 4, 1, 0, 0, 0, 0, 198,},
},
{
/* Coeff Band ( 6 ) */
{61658, 37743, 1203, 94, 10, 3, 0, 0, 0, 0, 0, 0,},
{15514, 11563, 903, 111, 14, 5, 0, 0, 0, 0, 0, 25195,},
{ 929, 1077, 291, 78, 14, 7, 1, 0, 0, 0, 0, 507,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 990, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 412, 13, 0, 0, 0, 0, 0, 0, 0, 0, 1641,},
{ 0, 18, 7, 1, 0, 0, 0, 0, 0, 0, 0, 30,},
},
},
{
/* Block Type ( 2 ) */
{
/* Coeff Band ( 0 ) */
{ 953, 24519, 628, 120, 28, 12, 4, 0, 0, 0, 0, 2248798,},
{1525, 25654, 2647, 617, 239, 143, 42, 5, 0, 0, 0, 66837,},
{1180, 11011, 3001, 1237, 532, 448, 239, 54, 5, 0, 0, 7122,},
},
{
/* Coeff Band ( 1 ) */
{1356, 2220, 67, 10, 4, 1, 0, 0, 0, 0, 0, 0,},
{1450, 2544, 102, 18, 4, 3, 0, 0, 0, 0, 0, 57063,},
{1182, 2110, 470, 130, 41, 21, 0, 0, 0, 0, 0, 6047,},
},
{
/* Coeff Band ( 2 ) */
{ 370, 3378, 200, 30, 5, 4, 1, 0, 0, 0, 0, 0,},
{ 293, 1006, 131, 29, 11, 0, 0, 0, 0, 0, 0, 5404,},
{ 114, 387, 98, 23, 4, 8, 1, 0, 0, 0, 0, 236,},
},
{
/* Coeff Band ( 3 ) */
{ 579, 194, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 395, 213, 5, 1, 0, 0, 0, 0, 0, 0, 0, 4157,},
{ 119, 122, 4, 0, 0, 0, 0, 0, 0, 0, 0, 300,},
},
{
/* Coeff Band ( 4 ) */
{ 38, 557, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 21, 114, 12, 1, 0, 0, 0, 0, 0, 0, 0, 427,},
{ 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7,},
},
{
/* Coeff Band ( 5 ) */
{ 52, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 18, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 652,},
{ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30,},
},
{
/* Coeff Band ( 6 ) */
{ 640, 569, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 25, 77, 2, 0, 0, 0, 0, 0, 0, 0, 0, 517,},
{ 4, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
},
{
/* Block Type ( 3 ) */
{
/* Coeff Band ( 0 ) */
{2506, 20161, 2707, 767, 261, 178, 107, 30, 14, 3, 0, 100694,},
{8806, 36478, 8817, 3268, 1280, 850, 401, 114, 42, 0, 0, 58572,},
{11003, 27214, 11798, 5716, 2482, 2072, 1048, 175, 32, 0, 0, 19284,},
},
{
/* Coeff Band ( 1 ) */
{9738, 11313, 959, 205, 70, 18, 11, 1, 0, 0, 0, 0,},
{12628, 15085, 1507, 273, 52, 19, 9, 0, 0, 0, 0, 54280,},
{10701, 15846, 5561, 1926, 813, 570, 249, 36, 0, 0, 0, 6460,},
},
{
/* Coeff Band ( 2 ) */
{6781, 22539, 2784, 634, 182, 123, 20, 4, 0, 0, 0, 0,},
{6263, 11544, 2649, 790, 259, 168, 27, 5, 0, 0, 0, 20539,},
{3109, 4075, 2031, 896, 457, 386, 158, 29, 0, 0, 0, 1138,},
},
{
/* Coeff Band ( 3 ) */
{11515, 4079, 465, 73, 5, 14, 2, 0, 0, 0, 0, 0,},
{9361, 5834, 650, 96, 24, 8, 4, 0, 0, 0, 0, 22181,},
{4343, 3974, 1360, 415, 132, 96, 14, 1, 0, 0, 0, 1267,},
},
{
/* Coeff Band ( 4 ) */
{4787, 9297, 823, 168, 44, 12, 4, 0, 0, 0, 0, 0,},
{3619, 4472, 719, 198, 60, 31, 3, 0, 0, 0, 0, 8401,},
{1157, 1175, 483, 182, 88, 31, 8, 0, 0, 0, 0, 268,},
},
{
/* Coeff Band ( 5 ) */
{8299, 1226, 32, 5, 1, 0, 0, 0, 0, 0, 0, 0,},
{3502, 1568, 57, 4, 1, 1, 0, 0, 0, 0, 0, 9811,},
{1055, 1070, 166, 29, 6, 1, 0, 0, 0, 0, 0, 527,},
},
{
/* Coeff Band ( 6 ) */
{27414, 27927, 1989, 347, 69, 26, 0, 0, 0, 0, 0, 0,},
{5876, 10074, 1574, 341, 91, 24, 4, 0, 0, 0, 0, 21954,},
{1571, 2171, 778, 324, 124, 65, 16, 0, 0, 0, 0, 979,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 459,},
{ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13,},
},
},
};
#endif //__DEFAULTCOEFCOUNTS_H

View File

@@ -26,8 +26,32 @@ typedef vp8_prob Prob;
#include "coefupdateprobs.h"
DECLARE_ALIGNED(16, cuchar, vp8_coef_bands[16]) = { 0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7};
DECLARE_ALIGNED(16, cuchar, vp8_prev_token_class[MAX_ENTROPY_TOKENS]) = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0};
DECLARE_ALIGNED(16, const unsigned char, vp8_norm[256]) =
{
0, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
DECLARE_ALIGNED(16, cuchar, vp8_coef_bands[16]) =
{ 0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7};
DECLARE_ALIGNED(16, cuchar, vp8_prev_token_class[MAX_ENTROPY_TOKENS]) =
{ 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0};
DECLARE_ALIGNED(16, const int, vp8_default_zig_zag1d[16]) =
{
0, 1, 4, 8,
@@ -65,7 +89,7 @@ const vp8_tree_index vp8_coef_tree[ 22] = /* corresponding _CONTEXT_NODEs */
-DCT_VAL_CATEGORY5, -DCT_VAL_CATEGORY6 /* 10 = CAT_FIVE */
};
struct vp8_token_struct vp8_coef_encodings[vp8_coef_tokens];
struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];
/* Trees for extra bits. Probabilities are constant and
do not depend on previously encoded bits */
@@ -145,10 +169,12 @@ void vp8_default_coef_probs(VP8_COMMON *pc)
do
{
unsigned int branch_ct [vp8_coef_tokens-1] [2];
unsigned int branch_ct [ENTROPY_NODES] [2];
vp8_tree_probs_from_distribution(
vp8_coef_tokens, vp8_coef_encodings, vp8_coef_tree,
pc->fc.coef_probs [h][i][k], branch_ct, default_coef_counts [h][i][k],
MAX_ENTROPY_TOKENS, vp8_coef_encodings, vp8_coef_tree,
pc->fc.coef_probs[h][i][k],
branch_ct,
vp8_default_coef_counts[h][i][k],
256, 1);
}

View File

@@ -30,13 +30,12 @@
#define DCT_VAL_CATEGORY6 10 /* 67+ Extra Bits 11+1 */
#define DCT_EOB_TOKEN 11 /* EOB Extra Bits 0+0 */
#define vp8_coef_tokens 12
#define MAX_ENTROPY_TOKENS vp8_coef_tokens
#define MAX_ENTROPY_TOKENS 12
#define ENTROPY_NODES 11
extern const vp8_tree_index vp8_coef_tree[];
extern struct vp8_token_struct vp8_coef_encodings[vp8_coef_tokens];
extern struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];
typedef struct
{
@@ -85,9 +84,9 @@ extern DECLARE_ALIGNED(16, const unsigned char, vp8_coef_bands[16]);
/*# define DC_TOKEN_CONTEXTS 3*/ /* 00, 0!0, !0!0 */
# define PREV_COEF_CONTEXTS 3
extern DECLARE_ALIGNED(16, const unsigned char, vp8_prev_token_class[vp8_coef_tokens]);
extern DECLARE_ALIGNED(16, const unsigned char, vp8_prev_token_class[MAX_ENTROPY_TOKENS]);
extern const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens-1];
extern const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [ENTROPY_NODES];
struct VP8Common;

View File

@@ -33,11 +33,11 @@ typedef enum
SUBMVREF_LEFT_ABOVE_ZED
} sumvfref_t;
int vp8_mv_cont(const MV *l, const MV *a)
int vp8_mv_cont(const int_mv *l, const int_mv *a)
{
int lez = (l->row == 0 && l->col == 0);
int aez = (a->row == 0 && a->col == 0);
int lea = (l->row == a->row && l->col == a->col);
int lez = (l->as_int == 0);
int aez = (a->as_int == 0);
int lea = (l->as_int == a->as_int);
if (lea && lez)
return SUBMVREF_LEFT_ABOVE_ZED;

View File

@@ -25,7 +25,7 @@ extern const int vp8_mbsplit_count [VP8_NUMMBSPLITS]; /* # of subsets */
extern const vp8_prob vp8_mbsplit_probs [VP8_NUMMBSPLITS-1];
extern int vp8_mv_cont(const MV *l, const MV *a);
extern int vp8_mv_cont(const int_mv *l, const int_mv *a);
#define SUBMVREF_COUNT 5
extern const vp8_prob vp8_sub_mv_ref_prob2 [SUBMVREF_COUNT][VP8_SUBMVREFS-1];

View File

@@ -85,10 +85,10 @@ void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
src->y_height, src->y_width,
et, el, eb, er);
et = (et + 1) >> 1;
el = (el + 1) >> 1;
eb = (eb + 1) >> 1;
er = (er + 1) >> 1;
et = dst->border >> 1;
el = dst->border >> 1;
eb = (dst->border >> 1) + dst->uv_height - src->uv_height;
er = (dst->border >> 1) + dst->uv_width - src->uv_width;
copy_and_extend_plane(src->u_buffer, src->uv_stride,
dst->u_buffer, dst->uv_stride,

View File

@@ -10,29 +10,6 @@
#include <stdlib.h>
#include <stdio.h>
#define REGISTER_FILTER 1
#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
#if REGISTER_FILTER
#define FILTER0 filter0
#define FILTER1 filter1
#define FILTER2 filter2
#define FILTER3 filter3
#define FILTER4 filter4
#define FILTER5 filter5
#else
#define FILTER0 vp8_filter[0]
#define FILTER1 vp8_filter[1]
#define FILTER2 vp8_filter[2]
#define FILTER3 vp8_filter[3]
#define FILTER4 vp8_filter[4]
#define FILTER5 vp8_filter[5]
#endif
#define SRC_INCREMENT src_increment
#include "filter.h"
#include "vpx_ports/mem.h"
@@ -50,6 +27,7 @@ DECLARE_ALIGNED(16, const short, vp8_bilinear_filters[8][2]) =
DECLARE_ALIGNED(16, const short, vp8_sub_pel_filters[8][6]) =
{
{ 0, 0, 128, 0, 0, 0 }, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0 },
{ 2, -11, 108, 36, -8, 1 }, /* New 1/4 pel 6 tap filter */
@@ -71,45 +49,35 @@ static void filter_block2d_first_pass
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
int Temp;
#if REGISTER_FILTER
short filter0 = vp8_filter[0];
short filter1 = vp8_filter[1];
short filter2 = vp8_filter[2];
short filter3 = vp8_filter[3];
short filter4 = vp8_filter[4];
short filter5 = vp8_filter[5];
#endif
int ps2 = 2*(int)pixel_step;
int ps3 = 3*(int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
Temp = ((int)src_ptr[-1*ps2] * FILTER0);
Temp += ((int)src_ptr[-1*(int)pixel_step] * FILTER1) +
((int)src_ptr[0] * FILTER2) +
((int)src_ptr[pixel_step] * FILTER3) +
((int)src_ptr[ps2] * FILTER4) +
((int)src_ptr[ps3] * FILTER5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
Temp = ((int)src_ptr[-2 * (int)pixel_step] * vp8_filter[0]) +
((int)src_ptr[-1 * (int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[0] * vp8_filter[2]) +
((int)src_ptr[pixel_step] * vp8_filter[3]) +
((int)src_ptr[2*pixel_step] * vp8_filter[4]) +
((int)src_ptr[3*pixel_step] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
CLAMP(Temp, 0, 255);
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[j] = Temp;
src_ptr++;
}
/* Next row... */
src_ptr += SRC_INCREMENT;
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_width;
}
}
@@ -126,45 +94,36 @@ static void filter_block2d_second_pass
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
#if REGISTER_FILTER
short filter0 = vp8_filter[0];
short filter1 = vp8_filter[1];
short filter2 = vp8_filter[2];
short filter3 = vp8_filter[3];
short filter4 = vp8_filter[4];
short filter5 = vp8_filter[5];
#endif
int ps2 = ((int)pixel_step) << 1;
int ps3 = ps2 + (int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
unsigned int i, j;
int Temp;
for (i = 0; i < output_height; i++)
{
for (j = 0; j < output_width; j++)
{
/* Apply filter */
Temp = ((int)src_ptr[-1*ps2] * FILTER0) +
((int)src_ptr[-1*(int)pixel_step] * FILTER1) +
((int)src_ptr[0] * FILTER2) +
((int)src_ptr[pixel_step] * FILTER3) +
((int)src_ptr[ps2] * FILTER4) +
((int)src_ptr[ps3] * FILTER5) +
Temp = ((int)src_ptr[-2 * (int)pixel_step] * vp8_filter[0]) +
((int)src_ptr[-1 * (int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[0] * vp8_filter[2]) +
((int)src_ptr[pixel_step] * vp8_filter[3]) +
((int)src_ptr[2*pixel_step] * vp8_filter[4]) +
((int)src_ptr[3*pixel_step] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
CLAMP(Temp, 0, 255);
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[j] = (unsigned char)Temp;
src_ptr++;
}
/* Start next row */
src_ptr += src_increment;
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_pitch;
}
}
@@ -208,7 +167,6 @@ void vp8_sixtap_predict_c
filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
}
void vp8_sixtap_predict8x8_c
(
unsigned char *src_ptr,

View File

@@ -25,9 +25,9 @@ void vp8_find_near_mvs
(
MACROBLOCKD *xd,
const MODE_INFO *here,
MV *nearest,
MV *nearby,
MV *best_mv,
int_mv *nearest,
int_mv *nearby,
int_mv *best_mv,
int cnt[4],
int refframe,
int *ref_frame_sign_bias
@@ -131,13 +131,14 @@ void vp8_find_near_mvs
near_mvs[CNT_INTRA] = near_mvs[CNT_NEAREST];
/* Set up return values */
*best_mv = near_mvs[0].as_mv;
*nearest = near_mvs[CNT_NEAREST].as_mv;
*nearby = near_mvs[CNT_NEAR].as_mv;
best_mv->as_int = near_mvs[0].as_int;
nearest->as_int = near_mvs[CNT_NEAREST].as_int;
nearby->as_int = near_mvs[CNT_NEAR].as_int;
vp8_clamp_mv(nearest, xd);
vp8_clamp_mv(nearby, xd);
vp8_clamp_mv(best_mv, xd); /*TODO: move this up before the copy*/
//TODO: move clamp outside findnearmv
vp8_clamp_mv2(nearest, xd);
vp8_clamp_mv2(nearby, xd);
vp8_clamp_mv2(best_mv, xd);
}
vp8_prob *vp8_mv_ref_probs(
@@ -152,26 +153,3 @@ vp8_prob *vp8_mv_ref_probs(
return p;
}
const B_MODE_INFO *vp8_left_bmi(const MODE_INFO *cur_mb, int b)
{
if (!(b & 3))
{
/* On L edge, get from MB to left of us */
--cur_mb;
b += 4;
}
return cur_mb->bmi + b - 1;
}
const B_MODE_INFO *vp8_above_bmi(const MODE_INFO *cur_mb, int b, int mi_stride)
{
if (!(b >> 2))
{
/* On top edge, get from MB above us */
cur_mb -= mi_stride;
b += 16;
}
return cur_mb->bmi + b - 4;
}

View File

@@ -17,11 +17,6 @@
#include "modecont.h"
#include "treecoder.h"
typedef union
{
unsigned int as_int;
MV as_mv;
} int_mv; /* facilitates rapid equality tests */
static void mv_bias(int refmb_ref_frame_sign_bias, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
{
@@ -39,24 +34,48 @@ static void mv_bias(int refmb_ref_frame_sign_bias, int refframe, int_mv *mvp, co
#define LEFT_TOP_MARGIN (16 << 3)
#define RIGHT_BOTTOM_MARGIN (16 << 3)
static void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
static void vp8_clamp_mv2(int_mv *mv, const MACROBLOCKD *xd)
{
if (mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
else if (mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
if (mv->as_mv.col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
mv->as_mv.col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
else if (mv->as_mv.col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
mv->as_mv.col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
if (mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
else if (mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
if (mv->as_mv.row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
mv->as_mv.row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
else if (mv->as_mv.row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
mv->as_mv.row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
}
static void vp8_clamp_mv(int_mv *mv, int mb_to_left_edge, int mb_to_right_edge,
int mb_to_top_edge, int mb_to_bottom_edge)
{
mv->as_mv.col = (mv->as_mv.col < mb_to_left_edge) ?
mb_to_left_edge : mv->as_mv.col;
mv->as_mv.col = (mv->as_mv.col > mb_to_right_edge) ?
mb_to_right_edge : mv->as_mv.col;
mv->as_mv.row = (mv->as_mv.row < mb_to_top_edge) ?
mb_to_top_edge : mv->as_mv.row;
mv->as_mv.row = (mv->as_mv.row > mb_to_bottom_edge) ?
mb_to_bottom_edge : mv->as_mv.row;
}
static unsigned int vp8_check_mv_bounds(int_mv *mv, int mb_to_left_edge,
int mb_to_right_edge, int mb_to_top_edge,
int mb_to_bottom_edge)
{
unsigned int need_to_clamp;
need_to_clamp = (mv->as_mv.col < mb_to_left_edge) ? 1 : 0;
need_to_clamp |= (mv->as_mv.col > mb_to_right_edge) ? 1 : 0;
need_to_clamp |= (mv->as_mv.row < mb_to_top_edge) ? 1 : 0;
need_to_clamp |= (mv->as_mv.row > mb_to_bottom_edge) ? 1 : 0;
return need_to_clamp;
}
void vp8_find_near_mvs
(
MACROBLOCKD *xd,
const MODE_INFO *here,
MV *nearest, MV *nearby, MV *best,
int_mv *nearest, int_mv *nearby, int_mv *best,
int near_mv_ref_cts[4],
int refframe,
int *ref_frame_sign_bias
@@ -66,10 +85,89 @@ vp8_prob *vp8_mv_ref_probs(
vp8_prob p[VP8_MVREFS-1], const int near_mv_ref_ct[4]
);
const B_MODE_INFO *vp8_left_bmi(const MODE_INFO *cur_mb, int b);
const B_MODE_INFO *vp8_above_bmi(const MODE_INFO *cur_mb, int b, int mi_stride);
extern const unsigned char vp8_mbsplit_offset[4][16];
static int left_block_mv(const MODE_INFO *cur_mb, int b)
{
if (!(b & 3))
{
/* On L edge, get from MB to left of us */
--cur_mb;
if(cur_mb->mbmi.mode != SPLITMV)
return cur_mb->mbmi.mv.as_int;
b += 4;
}
return (cur_mb->bmi + b - 1)->mv.as_int;
}
static int above_block_mv(const MODE_INFO *cur_mb, int b, int mi_stride)
{
if (!(b >> 2))
{
/* On top edge, get from MB above us */
cur_mb -= mi_stride;
if(cur_mb->mbmi.mode != SPLITMV)
return cur_mb->mbmi.mv.as_int;
b += 16;
}
return (cur_mb->bmi + b - 4)->mv.as_int;
}
static B_PREDICTION_MODE left_block_mode(const MODE_INFO *cur_mb, int b)
{
if (!(b & 3))
{
/* On L edge, get from MB to left of us */
--cur_mb;
switch (cur_mb->mbmi.mode)
{
case B_PRED:
return (cur_mb->bmi + b + 3)->as_mode;
case DC_PRED:
return B_DC_PRED;
case V_PRED:
return B_VE_PRED;
case H_PRED:
return B_HE_PRED;
case TM_PRED:
return B_TM_PRED;
default:
return B_DC_PRED;
}
}
return (cur_mb->bmi + b - 1)->as_mode;
}
static B_PREDICTION_MODE above_block_mode(const MODE_INFO *cur_mb, int b, int mi_stride)
{
if (!(b >> 2))
{
/* On top edge, get from MB above us */
cur_mb -= mi_stride;
switch (cur_mb->mbmi.mode)
{
case B_PRED:
return (cur_mb->bmi + b + 12)->as_mode;
case DC_PRED:
return B_DC_PRED;
case V_PRED:
return B_VE_PRED;
case H_PRED:
return B_HE_PRED;
case TM_PRED:
return B_TM_PRED;
default:
return B_DC_PRED;
}
}
return (cur_mb->bmi + b - 4)->as_mode;
}
#endif

View File

@@ -17,9 +17,53 @@
#include "vp8/common/idct.h"
#include "vp8/common/onyxc_int.h"
#if CONFIG_MULTITHREAD
#if HAVE_UNISTD_H
#include <unistd.h>
#elif defined(_WIN32)
#include <windows.h>
typedef void (WINAPI *PGNSI)(LPSYSTEM_INFO);
#endif
#endif
extern void vp8_arch_x86_common_init(VP8_COMMON *ctx);
extern void vp8_arch_arm_common_init(VP8_COMMON *ctx);
extern void vp8_arch_opencl_common_init(VP8_COMMON *ctx);
#if CONFIG_MULTITHREAD
static int get_cpu_count()
{
int core_count = 16;
#if HAVE_UNISTD_H
#if defined(_SC_NPROCESSORS_ONLN)
core_count = sysconf(_SC_NPROCESSORS_ONLN);
#elif defined(_SC_NPROC_ONLN)
core_count = sysconf(_SC_NPROC_ONLN);
#endif
#elif defined(_WIN32)
{
PGNSI pGNSI;
SYSTEM_INFO sysinfo;
/* Call GetNativeSystemInfo if supported or
* GetSystemInfo otherwise. */
pGNSI = (PGNSI) GetProcAddress(
GetModuleHandle(TEXT("kernel32.dll")), "GetNativeSystemInfo");
if (pGNSI != NULL)
pGNSI(&sysinfo);
else
GetSystemInfo(&sysinfo);
core_count = sysinfo.dwNumberOfProcessors;
}
#else
/* other platforms */
#endif
return core_count > 0 ? core_count : 1;
}
#endif
void vp8_machine_specific_config(VP8_COMMON *ctx)
{
@@ -44,6 +88,12 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
vp8_build_intra_predictors_mby;
rtcd->recon.build_intra_predictors_mby_s =
vp8_build_intra_predictors_mby_s;
rtcd->recon.build_intra_predictors_mbuv =
vp8_build_intra_predictors_mbuv;
rtcd->recon.build_intra_predictors_mbuv_s =
vp8_build_intra_predictors_mbuv_s;
rtcd->recon.intra4x4_predict =
vp8_intra4x4_predict;
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_c;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_c;
@@ -58,12 +108,12 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
rtcd->loopfilter.normal_b_v = vp8_loop_filter_bv_c;
rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_c;
rtcd->loopfilter.normal_b_h = vp8_loop_filter_bh_c;
rtcd->loopfilter.simple_mb_v = vp8_loop_filter_mbvs_c;
rtcd->loopfilter.simple_mb_v = vp8_loop_filter_simple_vertical_edge_c;
rtcd->loopfilter.simple_b_v = vp8_loop_filter_bvs_c;
rtcd->loopfilter.simple_mb_h = vp8_loop_filter_mbhs_c;
rtcd->loopfilter.simple_mb_h = vp8_loop_filter_simple_horizontal_edge_c;
rtcd->loopfilter.simple_b_h = vp8_loop_filter_bhs_c;
#if CONFIG_POSTPROC || (CONFIG_VP8_ENCODER && CONFIG_PSNR)
#if CONFIG_POSTPROC || (CONFIG_VP8_ENCODER && CONFIG_INTERNAL_STATS)
rtcd->postproc.down = vp8_mbpost_proc_down_c;
rtcd->postproc.across = vp8_mbpost_proc_across_ip_c;
rtcd->postproc.downacross = vp8_post_proc_down_and_across_c;
@@ -83,8 +133,7 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
vp8_arch_arm_common_init(ctx);
#endif
#if CONFIG_OPENCL && (ENABLE_CL_IDCT_DEQUANT || ENABLE_CL_SUBPIXEL || ENABLE_CL_LOOPFILTER)
vp8_arch_opencl_common_init(ctx);
#endif
#if CONFIG_MULTITHREAD
ctx->processor_core_count = get_cpu_count();
#endif /* CONFIG_MULTITHREAD */
}

View File

@@ -31,10 +31,6 @@
#include "arm/idct_arm.h"
#endif
#if CONFIG_OPENCL
#include "opencl/idct_cl.h"
#endif
#ifndef vp8_idct_idct1
#define vp8_idct_idct1 vp8_short_idct4x4llm_1_c
#endif

View File

@@ -11,6 +11,8 @@
#include "invtrans.h"
static void recon_dcblock(MACROBLOCKD *x)
{
BLOCKD *b = &x->block[24];
@@ -18,7 +20,7 @@ static void recon_dcblock(MACROBLOCKD *x)
for (i = 0; i < 16; i++)
{
*(x->block[i].dqcoeff_base+x->block[i].dqcoeff_offset) = b->diff_base[b->diff_offset+i];
x->block[i].dqcoeff[0] = b->diff[i];
}
}
@@ -26,18 +28,18 @@ static void recon_dcblock(MACROBLOCKD *x)
void vp8_inverse_transform_b(const vp8_idct_rtcd_vtable_t *rtcd, BLOCKD *b, int pitch)
{
if (b->eob > 1)
IDCT_INVOKE(rtcd, idct16)(b->dqcoeff_base + b->dqcoeff_offset, &b->diff_base[b->diff_offset], pitch);
IDCT_INVOKE(rtcd, idct16)(b->dqcoeff, b->diff, pitch);
else
IDCT_INVOKE(rtcd, idct1)(b->dqcoeff_base + b->dqcoeff_offset, &b->diff_base[b->diff_offset], pitch);
IDCT_INVOKE(rtcd, idct1)(b->dqcoeff, b->diff, pitch);
}
/* Only used in the encoder */
void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
int i;
/* do 2nd order transform on the dc block */
IDCT_INVOKE(rtcd, iwalsh16)(x->block[24].dqcoeff_base + x->block[23].dqcoeff_offset, &x->block[24].diff_base[x->block[24].diff_offset]);
IDCT_INVOKE(rtcd, iwalsh16)(x->block[24].dqcoeff, x->block[24].diff);
recon_dcblock(x);
@@ -47,8 +49,6 @@ void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *
}
}
/* Only used in encoder */
void vp8_inverse_transform_mbuv(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
int i;
@@ -57,6 +57,7 @@ void vp8_inverse_transform_mbuv(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD
{
vp8_inverse_transform_b(rtcd, &x->block[i], 16);
}
}
@@ -68,10 +69,8 @@ void vp8_inverse_transform_mb(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x
x->mode_info_context->mbmi.mode != SPLITMV)
{
/* do 2nd order transform on the dc block */
BLOCKD b = x->block[24];
IDCT_INVOKE(rtcd, iwalsh16)(b.dqcoeff_base+b.dqcoeff_offset, &b.diff_base[b.diff_offset]);
IDCT_INVOKE(rtcd, iwalsh16)(&x->block[24].dqcoeff[0], x->block[24].diff);
recon_dcblock(x);
}

View File

@@ -13,8 +13,8 @@
#define __INC_INVTRANS_H
#include "vpx_ports/config.h"
#include "vp8/common/idct.h"
#include "vp8/common/blockd.h"
#include "idct.h"
#include "blockd.h"
extern void vp8_inverse_transform_b(const vp8_idct_rtcd_vtable_t *rtcd, BLOCKD *b, int pitch);
extern void vp8_inverse_transform_mb(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x);
extern void vp8_inverse_transform_mby(const vp8_idct_rtcd_vtable_t *rtcd, MACROBLOCKD *x);

View File

@@ -9,164 +9,149 @@
*/
#include "vpx_ports/config.h"
#include "vpx_config.h"
#include "loopfilter.h"
#include "onyxc_int.h"
#if CONFIG_OPENCL
#include "opencl/loopfilter_cl.h"
#endif
#include "vpx_mem/vpx_mem.h"
typedef unsigned char uc;
prototype_loopfilter(vp8_loop_filter_horizontal_edge_c);
prototype_loopfilter(vp8_loop_filter_vertical_edge_c);
prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_c);
prototype_loopfilter(vp8_mbloop_filter_vertical_edge_c);
prototype_loopfilter(vp8_loop_filter_simple_horizontal_edge_c);
prototype_loopfilter(vp8_loop_filter_simple_vertical_edge_c);
prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_c);
prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_c);
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
}
void vp8_loop_filter_mbhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1);
}
void vp8_loop_filter_mbvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2);
vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bhs_c(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, blimit);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_c(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1);
vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bvs_c(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 4, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 8, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 12, y_stride, blimit);
}
void vp8_init_loop_filter(VP8_COMMON *cm)
static void lf_init_lut(loop_filter_info_n *lfi)
{
loop_filter_info *lfi = cm->lf_info;
LOOPFILTERTYPE lft = cm->filter_type;
int sharpness_lvl = cm->sharpness_level;
int frame_type = cm->frame_type;
int i, j;
int filt_lvl;
int block_inside_limit = 0;
int HEVThresh;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
for (filt_lvl = 0; filt_lvl <= MAX_LOOP_FILTER; filt_lvl++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
if (filt_lvl >= 40)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 2;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 3;
}
else if (filt_lvl >= 20)
{
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 1;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 2;
}
else if (filt_lvl >= 15)
{
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 1;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 1;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 0;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 0;
}
}
lfi->mode_lf_lut[DC_PRED] = 1;
lfi->mode_lf_lut[V_PRED] = 1;
lfi->mode_lf_lut[H_PRED] = 1;
lfi->mode_lf_lut[TM_PRED] = 1;
lfi->mode_lf_lut[B_PRED] = 0;
lfi->mode_lf_lut[ZEROMV] = 1;
lfi->mode_lf_lut[NEARESTMV] = 2;
lfi->mode_lf_lut[NEARMV] = 2;
lfi->mode_lf_lut[NEWMV] = 2;
lfi->mode_lf_lut[SPLITMV] = 3;
}
void vp8_loop_filter_update_sharpness(loop_filter_info_n *lfi,
int sharpness_lvl)
{
int i;
/* For each possible value for the loop filter fill out limits */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
int block_inside_limit = 0;
/* Set loop filter paramaeters that control sharpness. */
block_inside_limit = filt_lvl >> (sharpness_lvl > 0);
@@ -181,177 +166,143 @@ void vp8_init_loop_filter(VP8_COMMON *cm)
if (block_inside_limit < 1)
block_inside_limit = 1;
for (j = 0; j < 16; j++)
{
lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl + 2;
lfi[i].flim[j] = filt_lvl;
lfi[i].thr[j] = HEVThresh;
}
}
/* Set up the function pointers depending on the type of loop filtering selected */
if (lft == NORMAL_LOOPFILTER)
{
cm->lf_mbv = LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v);
cm->lf_bv = LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v);
cm->lf_mbh = LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h);
cm->lf_bh = LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h);
}
else
{
cm->lf_mbv = LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v);
cm->lf_bv = LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v);
cm->lf_mbh = LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h);
cm->lf_bh = LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h);
vpx_memset(lfi->lim[i], block_inside_limit, SIMD_WIDTH);
vpx_memset(lfi->blim[i], (2 * filt_lvl + block_inside_limit),
SIMD_WIDTH);
vpx_memset(lfi->mblim[i], (2 * (filt_lvl + 2) + block_inside_limit),
SIMD_WIDTH);
}
}
/* Put vp8_init_loop_filter() in vp8dx_create_decompressor(). Only call vp8_frame_init_loop_filter() while decoding
* each frame. Check last_frame_type to skip the function most of times.
*/
void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type)
void vp8_loop_filter_init(VP8_COMMON *cm)
{
int HEVThresh;
int i, j;
loop_filter_info_n *lfi = &cm->lf_info;
int i;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
/* init limits for given sharpness*/
vp8_loop_filter_update_sharpness(lfi, cm->sharpness_level);
cm->last_sharpness_level = cm->sharpness_level;
/* init LUT for lvl and hev thr picking */
lf_init_lut(lfi);
/* init hev threshold const vectors */
for(i = 0; i < 4 ; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
for (j = 0; j < 16; j++)
{
/*lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl+2;*/
/*lfi[i].flim[j] = filt_lvl;*/
lfi[i].thr[j] = HEVThresh;
}
vpx_memset(lfi->hev_thr[i], i, SIMD_WIDTH);
}
}
int vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int filter_level)
void vp8_loop_filter_frame_init(VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl)
{
MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;
int seg, /* segment number */
ref, /* index in ref_lf_deltas */
mode; /* index in mode_lf_deltas */
if (mbd->mode_ref_lf_delta_enabled)
loop_filter_info_n *lfi = &cm->lf_info;
/* update limits if sharpness has changed */
if(cm->last_sharpness_level != cm->sharpness_level)
{
vp8_loop_filter_update_sharpness(lfi, cm->sharpness_level);
cm->last_sharpness_level = cm->sharpness_level;
}
for(seg = 0; seg < MAX_MB_SEGMENTS; seg++)
{
int lvl_seg = default_filt_lvl;
int lvl_ref, lvl_mode;
/* Note the baseline filter values for each segment */
if (mbd->segmentation_enabled)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
{
lvl_seg = mbd->segment_feature_data[MB_LVL_ALT_LF][seg];
}
else /* Delta Value */
{
lvl_seg += mbd->segment_feature_data[MB_LVL_ALT_LF][seg];
lvl_seg = (lvl_seg > 0) ? ((lvl_seg > 63) ? 63: lvl_seg) : 0;
}
}
if (!mbd->mode_ref_lf_delta_enabled)
{
/* we could get rid of this if we assume that deltas are set to
* zero when not in use; encoder always uses deltas
*/
vpx_memset(lfi->lvl[seg][0], lvl_seg, 4 * 4 );
continue;
}
lvl_ref = lvl_seg;
/* INTRA_FRAME */
ref = INTRA_FRAME;
/* Apply delta for reference frame */
filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
lvl_ref += mbd->ref_lf_deltas[ref];
/* Apply delta for mode */
if (mbmi->ref_frame == INTRA_FRAME)
/* Apply delta for Intra modes */
mode = 0; /* B_PRED */
/* Only the split mode BPRED has a further special case */
lvl_mode = lvl_ref + mbd->mode_lf_deltas[mode];
lvl_mode = (lvl_mode > 0) ? (lvl_mode > 63 ? 63 : lvl_mode) : 0; /* clamp */
lfi->lvl[seg][ref][mode] = lvl_mode;
mode = 1; /* all the rest of Intra modes */
lvl_mode = (lvl_ref > 0) ? (lvl_ref > 63 ? 63 : lvl_ref) : 0; /* clamp */
lfi->lvl[seg][ref][mode] = lvl_mode;
/* LAST, GOLDEN, ALT */
for(ref = 1; ref < MAX_REF_FRAMES; ref++)
{
/* Only the split mode BPRED has a further special case */
if (mbmi->mode == B_PRED)
filter_level += mbd->mode_lf_deltas[0];
int lvl_ref = lvl_seg;
/* Apply delta for reference frame */
lvl_ref += mbd->ref_lf_deltas[ref];
/* Apply delta for Inter modes */
for (mode = 1; mode < 4; mode++)
{
lvl_mode = lvl_ref + mbd->mode_lf_deltas[mode];
lvl_mode = (lvl_mode > 0) ? (lvl_mode > 63 ? 63 : lvl_mode) : 0; /* clamp */
lfi->lvl[seg][ref][mode] = lvl_mode;
}
}
else
{
/* Zero motion mode */
if (mbmi->mode == ZEROMV)
filter_level += mbd->mode_lf_deltas[1];
/* Split MB motion mode */
else if (mbmi->mode == SPLITMV)
filter_level += mbd->mode_lf_deltas[3];
/* All other inter motion modes (Nearest, Near, New) */
else
filter_level += mbd->mode_lf_deltas[2];
}
/* Range check */
if (filter_level > MAX_LOOP_FILTER)
filter_level = MAX_LOOP_FILTER;
else if (filter_level < 0)
filter_level = 0;
}
return filter_level;
}
void vp8_loop_filter_frame
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
MACROBLOCKD *mbd
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
loop_filter_info *lfi = cm->lf_info;
loop_filter_info_n *lfi_n = &cm->lf_info;
loop_filter_info lfi;
FRAME_TYPE frame_type = cm->frame_type;
int mb_row;
int mb_col;
int baseline_filter_level[MAX_MB_SEGMENTS];
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
int i;
unsigned char *y_ptr, *u_ptr, *v_ptr;
#if CONFIG_OPENCL && ENABLE_CL_LOOPFILTER
if ( cl_initialized == CL_SUCCESS ){
vp8_loop_filter_frame_cl(cm,mbd,default_filt_lvl);
return;
}
#endif
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
/* Point at base of Mb MODE_INFO list */
const MODE_INFO *mode_info_context = cm->mi;
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter(lfi, frame_type);
vp8_loop_filter_frame_init(cm, mbd, cm->filter_level);
/* Set up the buffer pointers */
y_ptr = post->y_buffer;
@@ -363,102 +314,108 @@ void vp8_loop_filter_frame
{
for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
mode_info_context->mbmi.mode != SPLITMV &&
mode_info_context->mbmi.mb_skip_coeff);
filter_level = baseline_filter_level[Segment];
const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
const int seg = mode_info_context->mbmi.segment_id;
const int ref_frame = mode_info_context->mbmi.ref_frame;
/* Distance of Mb to the various image edges.
* These specified to 8th pel as they are always compared to values that are in 1/8th pel units
* Apply any context driven MB level adjustment
*/
filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
if (filter_level)
{
if (mb_col > 0)
cm->lf_mbv(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
if (cm->filter_type == NORMAL_LOOPFILTER)
{
const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
lfi.mblim = lfi_n->mblim[filter_level];
lfi.blim = lfi_n->blim[filter_level];
lfi.lim = lfi_n->lim[filter_level];
lfi.hev_thr = lfi_n->hev_thr[hev_index];
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bv(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
/* don't apply across umv border */
if (mb_row > 0)
cm->lf_mbh(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bh(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
/* don't apply across umv border */
if (mb_row > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
}
else
{
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
/* don't apply across umv border */
if (mb_row > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
}
}
y_ptr += 16;
u_ptr += 8;
v_ptr += 8;
mbd->mode_info_context++; /* step to next MB */
mode_info_context++; /* step to next MB */
}
y_ptr += post->y_stride * 16 - post->y_width;
u_ptr += post->uv_stride * 8 - post->uv_width;
v_ptr += post->uv_stride * 8 - post->uv_width;
mbd->mode_info_context++; /* Skip border mb */
mode_info_context++; /* Skip border mb */
}
}
/* Encoder only... */
void vp8_loop_filter_frame_yonly
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl,
int sharpness_lvl
int default_filt_lvl
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
int i;
unsigned char *y_ptr;
int mb_row;
int mb_col;
loop_filter_info *lfi = cm->lf_info;
int baseline_filter_level[MAX_MB_SEGMENTS];
loop_filter_info_n *lfi_n = &cm->lf_info;
loop_filter_info lfi;
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
FRAME_TYPE frame_type = cm->frame_type;
(void) sharpness_lvl;
/* Point at base of Mb MODE_INFO list */
const MODE_INFO *mode_info_context = cm->mi;
/*MODE_INFO * this_mb_mode_info = cm->mi;*/ /* Point at base of Mb MODE_INFO list */
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
#if 0
if(default_filt_lvl == 0) /* no filter applied */
return;
#endif
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter(lfi, frame_type);
vp8_loop_filter_frame_init( cm, mbd, default_filt_lvl);
/* Set up the buffer pointers */
y_ptr = post->y_buffer;
@@ -468,72 +425,106 @@ void vp8_loop_filter_frame_yonly
{
for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
filter_level = baseline_filter_level[Segment];
int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
mode_info_context->mbmi.mode != SPLITMV &&
mode_info_context->mbmi.mb_skip_coeff);
/* Apply any context driven MB level adjustment */
filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
const int seg = mode_info_context->mbmi.segment_id;
const int ref_frame = mode_info_context->mbmi.ref_frame;
filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
if (filter_level)
{
if (mb_col > 0)
cm->lf_mbv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (cm->filter_type == NORMAL_LOOPFILTER)
{
const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
lfi.mblim = lfi_n->mblim[filter_level];
lfi.blim = lfi_n->blim[filter_level];
lfi.lim = lfi_n->lim[filter_level];
lfi.hev_thr = lfi_n->hev_thr[hev_index];
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
/* don't apply across umv border */
if (mb_row > 0)
cm->lf_mbh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
/* don't apply across umv border */
if (mb_row > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
}
else
{
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
/* don't apply across umv border */
if (mb_row > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
}
}
y_ptr += 16;
mbd->mode_info_context ++; /* step to next MB */
mode_info_context ++; /* step to next MB */
}
y_ptr += post->y_stride * 16 - post->y_width;
mbd->mode_info_context ++; /* Skip border mb */
mode_info_context ++; /* Skip border mb */
}
}
/* Encoder only... */
void vp8_loop_filter_partial_frame
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl,
int sharpness_lvl,
int Fraction
int default_filt_lvl
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
int i;
unsigned char *y_ptr;
int mb_row;
int mb_col;
/*int mb_rows = post->y_height >> 4;*/
int mb_cols = post->y_width >> 4;
int linestocopy;
int linestocopy, i;
loop_filter_info_n *lfi_n = &cm->lf_info;
loop_filter_info lfi;
loop_filter_info *lfi = cm->lf_info;
int baseline_filter_level[MAX_MB_SEGMENTS];
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
FRAME_TYPE frame_type = cm->frame_type;
(void) sharpness_lvl;
const MODE_INFO *mode_info_context;
/*MODE_INFO * this_mb_mode_info = cm->mi + (post->y_height>>5) * (mb_cols + 1);*/ /* Point at base of Mb MODE_INFO list */
mbd->mode_info_context = cm->mi + (post->y_height >> 5) * (mb_cols + 1); /* Point at base of Mb MODE_INFO list */
int lvl_seg[MAX_MB_SEGMENTS];
linestocopy = (post->y_height >> (4 + Fraction));
mode_info_context = cm->mi + (post->y_height >> 5) * (mb_cols + 1);
/* 3 is a magic number. 4 is probably magic too */
linestocopy = (post->y_height >> (4 + 3));
if (linestocopy < 1)
linestocopy = 1;
@@ -541,32 +532,27 @@ void vp8_loop_filter_partial_frame
linestocopy <<= 4;
/* Note the baseline filter values for each segment */
/* See vp8_loop_filter_frame_init. Rather than call that for each change
* to default_filt_lvl, copy the relevant calculation here.
*/
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
{ /* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
{
lvl_seg[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
}
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
lvl_seg[i] = default_filt_lvl
+ mbd->segment_feature_data[MB_LVL_ALT_LF][i];
lvl_seg[i] = (lvl_seg[i] > 0) ?
((lvl_seg[i] > 63) ? 63: lvl_seg[i]) : 0;
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter(lfi, frame_type);
/* Set up the buffer pointers */
y_ptr = post->y_buffer + (post->y_height >> 5) * 16 * post->y_stride;
@@ -576,28 +562,64 @@ void vp8_loop_filter_partial_frame
{
for (mb_col = 0; mb_col < mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
filter_level = baseline_filter_level[Segment];
int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
mode_info_context->mbmi.mode != SPLITMV &&
mode_info_context->mbmi.mb_skip_coeff);
if (alt_flt_enabled)
filter_level = lvl_seg[mode_info_context->mbmi.segment_id];
else
filter_level = default_filt_lvl;
if (filter_level)
{
if (mb_col > 0)
cm->lf_mbv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (cm->filter_type == NORMAL_LOOPFILTER)
{
const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
lfi.mblim = lfi_n->mblim[filter_level];
lfi.blim = lfi_n->blim[filter_level];
lfi.lim = lfi_n->lim[filter_level];
lfi.hev_thr = lfi_n->hev_thr[hev_index];
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
cm->lf_mbh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
}
else
{
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
}
}
y_ptr += 16;
mbd->mode_info_context += 1; /* step to next MB */
mode_info_context += 1; /* step to next MB */
}
y_ptr += post->y_stride * 16 - post->y_width;
mbd->mode_info_context += 1; /* Skip border mb */
mode_info_context += 1; /* Skip border mb */
}
}

View File

@@ -13,6 +13,7 @@
#define loopfilter_h
#include "vpx_ports/mem.h"
#include "vpx_config.h"
#define MAX_LOOP_FILTER 63
@@ -22,26 +23,45 @@ typedef enum
SIMPLE_LOOPFILTER = 1
} LOOPFILTERTYPE;
/* FRK
* Need to align this structure so when it is declared and
#if ARCH_ARM
#define SIMD_WIDTH 1
#else
#define SIMD_WIDTH 16
#endif
/* Need to align this structure so when it is declared and
* passed it can be loaded into vector registers.
*/
typedef struct
{
DECLARE_ALIGNED(16, signed char, lim[16]);
DECLARE_ALIGNED(16, signed char, flim[16]);
DECLARE_ALIGNED(16, signed char, thr[16]);
DECLARE_ALIGNED(16, signed char, mbflim[16]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, mblim[MAX_LOOP_FILTER + 1][SIMD_WIDTH]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, blim[MAX_LOOP_FILTER + 1][SIMD_WIDTH]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, lim[MAX_LOOP_FILTER + 1][SIMD_WIDTH]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, hev_thr[4][SIMD_WIDTH]);
unsigned char lvl[4][4][4];
unsigned char hev_thr_lut[2][MAX_LOOP_FILTER + 1];
unsigned char mode_lf_lut[10];
} loop_filter_info_n;
typedef struct
{
const unsigned char * mblim;
const unsigned char * blim;
const unsigned char * lim;
const unsigned char * hev_thr;
} loop_filter_info;
#define prototype_loopfilter(sym) \
void sym(unsigned char *src, int pitch, const signed char *flimit,\
const signed char *limit, const signed char *thresh, int count)
void sym(unsigned char *src, int pitch, const unsigned char *blimit,\
const unsigned char *limit, const unsigned char *thresh, int count)
#define prototype_loopfilter_block(sym) \
void sym(unsigned char *y, unsigned char *u, unsigned char *v,\
int ystride, int uv_stride, loop_filter_info *lfi, int simpler)
void sym(unsigned char *y, unsigned char *u, unsigned char *v, \
int ystride, int uv_stride, loop_filter_info *lfi)
#define prototype_simple_loopfilter(sym) \
void sym(unsigned char *y, int ystride, const unsigned char *blimit)
#if ARCH_X86 || ARCH_X86_64
#include "x86/loopfilter_x86.h"
@@ -71,38 +91,39 @@ extern prototype_loopfilter_block(vp8_lf_normal_mb_h);
#endif
extern prototype_loopfilter_block(vp8_lf_normal_b_h);
#ifndef vp8_lf_simple_mb_v
#define vp8_lf_simple_mb_v vp8_loop_filter_mbvs_c
#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_mb_v);
extern prototype_simple_loopfilter(vp8_lf_simple_mb_v);
#ifndef vp8_lf_simple_b_v
#define vp8_lf_simple_b_v vp8_loop_filter_bvs_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_b_v);
extern prototype_simple_loopfilter(vp8_lf_simple_b_v);
#ifndef vp8_lf_simple_mb_h
#define vp8_lf_simple_mb_h vp8_loop_filter_mbhs_c
#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_mb_h);
extern prototype_simple_loopfilter(vp8_lf_simple_mb_h);
#ifndef vp8_lf_simple_b_h
#define vp8_lf_simple_b_h vp8_loop_filter_bhs_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_b_h);
extern prototype_simple_loopfilter(vp8_lf_simple_b_h);
typedef prototype_loopfilter_block((*vp8_lf_block_fn_t));
typedef prototype_simple_loopfilter((*vp8_slf_block_fn_t));
typedef struct
{
vp8_lf_block_fn_t normal_mb_v;
vp8_lf_block_fn_t normal_b_v;
vp8_lf_block_fn_t normal_mb_h;
vp8_lf_block_fn_t normal_b_h;
vp8_lf_block_fn_t simple_mb_v;
vp8_lf_block_fn_t simple_b_v;
vp8_lf_block_fn_t simple_mb_h;
vp8_lf_block_fn_t simple_b_h;
vp8_slf_block_fn_t simple_mb_v;
vp8_slf_block_fn_t simple_b_v;
vp8_slf_block_fn_t simple_mb_h;
vp8_slf_block_fn_t simple_b_h;
} vp8_loopfilter_rtcd_vtable_t;
#if CONFIG_RUNTIME_CPU_DETECT
@@ -115,10 +136,33 @@ typedef void loop_filter_uvfunction
(
unsigned char *u, /* source pointer */
int p, /* pitch */
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
unsigned char *v
);
/* assorted loopfilter functions which get used elsewhere */
struct VP8Common;
struct MacroBlockD;
void vp8_loop_filter_init(struct VP8Common *cm);
void vp8_loop_filter_frame_init(struct VP8Common *cm,
struct MacroBlockD *mbd,
int default_filt_lvl);
void vp8_loop_filter_frame(struct VP8Common *cm, struct MacroBlockD *mbd);
void vp8_loop_filter_partial_frame(struct VP8Common *cm,
struct MacroBlockD *mbd,
int default_filt_lvl);
void vp8_loop_filter_frame_yonly(struct VP8Common *cm,
struct MacroBlockD *mbd,
int default_filt_lvl);
void vp8_loop_filter_update_sharpness(loop_filter_info_n *lfi,
int sharpness_lvl);
#endif

View File

@@ -24,8 +24,9 @@ static __inline signed char vp8_signed_char_clamp(int t)
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
static __inline signed char vp8_filter_mask(signed char limit, signed char flimit,
uc p3, uc p2, uc p1, uc p0, uc q0, uc q1, uc q2, uc q3)
static __inline signed char vp8_filter_mask(uc limit, uc blimit,
uc p3, uc p2, uc p1, uc p0,
uc q0, uc q1, uc q2, uc q3)
{
signed char mask = 0;
mask |= (abs(p3 - p2) > limit) * -1;
@@ -34,13 +35,13 @@ static __inline signed char vp8_filter_mask(signed char limit, signed char flimi
mask |= (abs(q1 - q0) > limit) * -1;
mask |= (abs(q2 - q1) > limit) * -1;
mask |= (abs(q3 - q2) > limit) * -1;
mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > flimit * 2 + limit) * -1;
mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit) * -1;
mask = ~mask;
return mask;
}
/* is there high variance internal edge ( 11111111 yes, 00000000 no) */
static __inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0, uc q1)
static __inline signed char vp8_hevmask(uc thresh, uc p1, uc p0, uc q0, uc q1)
{
signed char hev = 0;
hev |= (abs(p1 - p0) > thresh) * -1;
@@ -48,7 +49,9 @@ static __inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0,
return hev;
}
static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *op0, uc *oq0, uc *oq1)
static __inline void vp8_filter(signed char mask, uc hev, uc *op1,
uc *op0, uc *oq0, uc *oq1)
{
signed char ps0, qs0;
signed char ps1, qs1;
@@ -93,14 +96,13 @@ static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *
*op1 = u ^ 0x80;
}
void vp8_loop_filter_horizontal_edge_c
(
unsigned char *s,
int p, /* pitch */
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -113,11 +115,11 @@ void vp8_loop_filter_horizontal_edge_c
*/
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4*p], s[-3*p], s[-2*p], s[-1*p],
s[0*p], s[1*p], s[2*p], s[3*p]);
hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
hev = vp8_hevmask(thresh[0], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_filter(mask, hev, s - 2 * p, s - 1 * p, s, s + 1 * p);
@@ -130,9 +132,9 @@ void vp8_loop_filter_vertical_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -145,10 +147,10 @@ void vp8_loop_filter_vertical_edge_c
*/
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
hev = vp8_hevmask(thresh[0], s[-2], s[-1], s[0], s[1]);
vp8_filter(mask, hev, s - 2, s - 1, s, s + 1);
@@ -157,7 +159,7 @@ void vp8_loop_filter_vertical_edge_c
while (++i < count * 8);
}
static __inline void vp8_mbfilter(signed char mask, signed char hev,
static __inline void vp8_mbfilter(signed char mask, uc hev,
uc *op2, uc *op1, uc *op0, uc *oq0, uc *oq1, uc *oq2)
{
signed char s, u;
@@ -216,9 +218,9 @@ void vp8_mbloop_filter_horizontal_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -232,11 +234,11 @@ void vp8_mbloop_filter_horizontal_edge_c
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4*p], s[-3*p], s[-2*p], s[-1*p],
s[0*p], s[1*p], s[2*p], s[3*p]);
hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
hev = vp8_hevmask(thresh[0], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_mbfilter(mask, hev, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p, s + 2 * p);
@@ -251,9 +253,9 @@ void vp8_mbloop_filter_vertical_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -264,10 +266,10 @@ void vp8_mbloop_filter_vertical_edge_c
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
hev = vp8_hevmask(thresh[0], s[-2], s[-1], s[0], s[1]);
vp8_mbfilter(mask, hev, s - 3, s - 2, s - 1, s, s + 1, s + 2);
@@ -278,13 +280,13 @@ void vp8_mbloop_filter_vertical_edge_c
}
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
static __inline signed char vp8_simple_filter_mask(signed char limit, signed char flimit, uc p1, uc p0, uc q0, uc q1)
static __inline signed char vp8_simple_filter_mask(uc blimit, uc p1, uc p0, uc q0, uc q1)
{
/* Why does this cause problems for win32?
* error C2143: syntax error : missing ';' before 'type'
* (void) limit;
*/
signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 <= flimit * 2 + limit) * -1;
signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 <= blimit) * -1;
return mask;
}
@@ -317,47 +319,37 @@ void vp8_loop_filter_simple_horizontal_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count
const unsigned char *blimit
)
{
signed char mask = 0;
int i = 0;
(void) thresh;
do
{
/*mask = vp8_simple_filter_mask( limit[i], flimit[i],s[-1*p],s[0*p]);*/
mask = vp8_simple_filter_mask(limit[i], flimit[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
mask = vp8_simple_filter_mask(blimit[0], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_simple_filter(mask, s - 2 * p, s - 1 * p, s, s + 1 * p);
++s;
}
while (++i < count * 8);
while (++i < 16);
}
void vp8_loop_filter_simple_vertical_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count
const unsigned char *blimit
)
{
signed char mask = 0;
int i = 0;
(void) thresh;
do
{
/*mask = vp8_simple_filter_mask( limit[i], flimit[i],s[-1],s[0]);*/
mask = vp8_simple_filter_mask(limit[i], flimit[i], s[-2], s[-1], s[0], s[1]);
mask = vp8_simple_filter_mask(blimit[0], s[-2], s[-1], s[0], s[1]);
vp8_simple_filter(mask, s - 2, s - 1, s, s + 1);
s += p;
}
while (++i < count * 8);
while (++i < 16);
}

View File

@@ -11,12 +11,6 @@
#include "blockd.h"
#include "stdio.h"
#include "vpx_config.h"
#if CONFIG_OPENCL
#include "opencl/vp8_opencl.h"
#endif
typedef enum
{
PRED = 0,
@@ -26,6 +20,7 @@ typedef enum
static void setup_block
(
BLOCKD *b,
int mv_stride,
unsigned char **base,
int Stride,
int offset,
@@ -54,176 +49,81 @@ static void setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
int block;
unsigned char **y, **u, **v;
unsigned char **buf_base;
int y_off, u_off, v_off;
if (bs == DEST)
{
buf_base = &x->dst.buffer_alloc;
y_off = x->dst.y_buffer - x->dst.buffer_alloc;
u_off = x->dst.u_buffer - x->dst.buffer_alloc;
v_off = x->dst.v_buffer - x->dst.buffer_alloc;
y = &x->dst.y_buffer;
u = &x->dst.u_buffer;
v = &x->dst.v_buffer;
y_off = 0;
//y = buf_base;
//y_off = x->dst.y_buffer - x->dst.buffer_alloc;
u = buf_base;
v = buf_base;
u_off = x->dst.u_buffer - x->dst.buffer_alloc;
v_off = x->dst.v_buffer - x->dst.buffer_alloc;
}
else
{
buf_base = &x->pre.buffer_alloc;
y = &x->pre.y_buffer;
u = &x->pre.u_buffer;
v = &x->pre.v_buffer;
y_off = u_off = v_off = 0;
//y = buf_base;
//y_off = x->pre.y_buffer - x->pre.buffer_alloc;
//u = buf_base;
//u_off = x->pre.u_buffer - x->pre.buffer_alloc;
//v = buf_base;
//v_off = x->pre.v_buffer - x->pre.buffer_alloc;
}
for (block = 0; block < 16; block++) /* y blocks */
{
setup_block(&x->block[block], y, x->dst.y_stride,
y_off + ((block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4), bs);
setup_block(&x->block[block], x->dst.y_stride, y, x->dst.y_stride,
(block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4, bs);
}
for (block = 16; block < 20; block++) /* U and V blocks */
{
int block_off = ((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4;
setup_block(&x->block[block], x->dst.uv_stride, u, x->dst.uv_stride,
((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
setup_block(&x->block[block], u, x->dst.uv_stride,
u_off + block_off, bs);
setup_block(&x->block[block+4], v, x->dst.uv_stride,
v_off + block_off, bs);
setup_block(&x->block[block+4], x->dst.uv_stride, v, x->dst.uv_stride,
((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
}
}
void vp8_setup_block_dptrs(MACROBLOCKD *x)
{
int r, c;
unsigned int offset;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
cl_command_queue y_cq, u_cq, v_cq;
int err;
if (cl_initialized == CL_SUCCESS){
//Create command queue for Y/U/V Planes
y_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
if (!y_cq || err != CL_SUCCESS) {
printf("Error: Failed to create a command queue!\n");
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
}
u_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
if (!u_cq || err != CL_SUCCESS) {
printf("Error: Failed to create a command queue!\n");
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
}
v_cq = clCreateCommandQueue(cl_data.context, cl_data.device_id, 0, &err);
if (!v_cq || err != CL_SUCCESS) {
printf("Error: Failed to create a command queue!\n");
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
}
}
#endif
/* 16 Y blocks */
for (r = 0; r < 4; r++)
{
for (c = 0; c < 4; c++)
{
offset = r * 4 * 16 + c * 4;
x->block[r*4+c].diff_offset = offset;
x->block[r*4+c].predictor_offset = offset;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
if (cl_initialized == CL_SUCCESS)
x->block[r*4+c].cl_commands = y_cq;
#endif
x->block[r*4+c].diff = &x->diff[r * 4 * 16 + c * 4];
x->block[r*4+c].predictor = x->predictor + r * 4 * 16 + c * 4;
}
}
/* 4 U Blocks */
for (r = 0; r < 2; r++)
{
for (c = 0; c < 2; c++)
{
offset = 256 + r * 4 * 8 + c * 4;
x->block[16+r*2+c].diff_offset = offset;
x->block[16+r*2+c].predictor_offset = offset;
x->block[16+r*2+c].diff = &x->diff[256 + r * 4 * 8 + c * 4];
x->block[16+r*2+c].predictor = x->predictor + 256 + r * 4 * 8 + c * 4;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
if (cl_initialized == CL_SUCCESS)
x->block[16+r*2+c].cl_commands = u_cq;
#endif
}
}
/* 4 V Blocks */
for (r = 0; r < 2; r++)
{
for (c = 0; c < 2; c++)
{
offset = 320+ r * 4 * 8 + c * 4;
x->block[20+r*2+c].diff_offset = offset;
x->block[20+r*2+c].predictor_offset = offset;
x->block[20+r*2+c].diff = &x->diff[320+ r * 4 * 8 + c * 4];
x->block[20+r*2+c].predictor = x->predictor + 320 + r * 4 * 8 + c * 4;
#if CONFIG_OPENCL && !ONE_CQ_PER_MB
if (cl_initialized == CL_SUCCESS)
x->block[20+r*2+c].cl_commands = v_cq;
#endif
}
}
x->block[24].diff_offset = 384;
x->block[24].diff = &x->diff[384];
for (r = 0; r < 25; r++)
{
x->block[r].qcoeff_base = x->qcoeff;
x->block[r].qcoeff_offset = r * 16;
x->block[r].dqcoeff_base = x->dqcoeff;
x->block[r].dqcoeff_offset = r * 16;
x->block[r].predictor_base = x->predictor;
x->block[r].diff_base = x->diff;
x->block[r].eobs_base = x->eobs;
#if CONFIG_OPENCL
if (cl_initialized == CL_SUCCESS){
/* Copy command queue reference from macroblock */
#if ONE_CQ_PER_MB
x->block[r].cl_commands = x->cl_commands;
#endif
/* Set up CL memory buffers as appropriate */
x->block[r].cl_diff_mem = x->cl_diff_mem;
x->block[r].cl_dqcoeff_mem = x->cl_dqcoeff_mem;
x->block[r].cl_eobs_mem = x->cl_eobs_mem;
x->block[r].cl_predictor_mem = x->cl_predictor_mem;
x->block[r].cl_qcoeff_mem = x->cl_qcoeff_mem;
}
//Copy filter type to block.
x->block[r].sixtap_filter = x->sixtap_filter;
#endif
x->block[r].qcoeff = x->qcoeff + r * 16;
x->block[r].dqcoeff = x->dqcoeff + r * 16;
}
}
void vp8_build_block_doffsets(MACROBLOCKD *x)
{
/* handle the destination pitch features */
setup_macroblock(x, DEST);
setup_macroblock(x, PRED);

View File

@@ -11,6 +11,7 @@
#ifndef __INC_MV_H
#define __INC_MV_H
#include "vpx/vpx_integer.h"
typedef struct
{
@@ -18,4 +19,10 @@ typedef struct
short col;
} MV;
typedef union
{
uint32_t as_int;
MV as_mv;
} int_mv; /* facilitates faster equality tests and copies */
#endif

View File

@@ -109,6 +109,7 @@ extern "C"
int noise_sensitivity; // parameter used for applying pre processing blur: recommendation 0
int Sharpness; // parameter used for sharpening output: recommendation 0:
int cpu_used;
unsigned int rc_max_intra_bitrate_pct;
// mode ->
//(0)=Realtime/Live Encoding. This mode is optimized for realtim encoding (for example, capturing
@@ -139,8 +140,9 @@ extern "C"
int end_usage; // vbr or cbr
// shoot to keep buffer full at all times by undershooting a bit 95 recommended
// buffer targeting aggressiveness
int under_shoot_pct;
int over_shoot_pct;
// buffering parameters
int starting_buffer_level; // in seconds
@@ -182,8 +184,11 @@ extern "C"
int token_partitions; // how many token partitions to create for multi core decoding
int encode_breakout; // early breakout encode threshold : for video conf recommend 800
int error_resilient_mode; // if running over udp networks provides decodable frames after a
// dropped packet
unsigned int error_resilient_mode; // Bitfield defining the error
// resiliency features to enable. Can provide
// decodable frames after losses in previous
// frames and decodable partitions after
// losses in the same frame.
int arnr_max_frames;
int arnr_strength ;
@@ -206,8 +211,8 @@ extern "C"
// receive a frames worth of data caller can assume that a copy of this frame is made
// and not just a copy of the pointer..
int vp8_receive_raw_frame(VP8_PTR comp, unsigned int frame_flags, YV12_BUFFER_CONFIG *sd, INT64 time_stamp, INT64 end_time_stamp);
int vp8_get_compressed_data(VP8_PTR comp, unsigned int *frame_flags, unsigned long *size, unsigned char *dest, INT64 *time_stamp, INT64 *time_end, int flush);
int vp8_receive_raw_frame(VP8_PTR comp, unsigned int frame_flags, YV12_BUFFER_CONFIG *sd, int64_t time_stamp, int64_t end_time_stamp);
int vp8_get_compressed_data(VP8_PTR comp, unsigned int *frame_flags, unsigned long *size, unsigned char *dest, int64_t *time_stamp, int64_t *time_end, int flush);
int vp8_get_preview_raw_frame(VP8_PTR comp, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t *flags);
int vp8_use_as_reference(VP8_PTR comp, int ref_frame_flags);

View File

@@ -19,7 +19,9 @@
#include "entropy.h"
#include "idct.h"
#include "recon.h"
#if CONFIG_POSTPROC
#include "postproc.h"
#endif
/*#ifdef PACKET_TESTING*/
#include "header.h"
@@ -35,13 +37,15 @@ void vp8_initialize_common(void);
#define NUM_YV12_BUFFERS 4
#define MAX_PARTITIONS 9
typedef struct frame_contexts
{
vp8_prob bmode_prob [VP8_BINTRAMODES-1];
vp8_prob ymode_prob [VP8_YMODES-1]; /* interframe intra mode probs */
vp8_prob uv_mode_prob [VP8_UV_MODES-1];
vp8_prob sub_mv_ref_prob [VP8_SUBMVREFS-1];
vp8_prob coef_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens-1];
vp8_prob coef_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [ENTROPY_NODES];
MV_CONTEXT mvc[2];
MV_CONTEXT pre_mvc[2]; /* not to caculate the mvcost for the frame if mvc doesn't change. */
} FRAME_CONTEXT;
@@ -73,7 +77,9 @@ typedef struct VP8_COMMON_RTCD
vp8_recon_rtcd_vtable_t recon;
vp8_subpix_rtcd_vtable_t subpix;
vp8_loopfilter_rtcd_vtable_t loopfilter;
#if CONFIG_POSTPROC
vp8_postproc_rtcd_vtable_t postproc;
#endif
int flags;
#else
int unused;
@@ -81,6 +87,7 @@ typedef struct VP8_COMMON_RTCD
} VP8_COMMON_RTCD;
typedef struct VP8Common
{
struct vpx_internal_error_info error;
@@ -105,7 +112,8 @@ typedef struct VP8Common
YV12_BUFFER_CONFIG post_proc_buffer;
YV12_BUFFER_CONFIG temp_scale_frame;
FRAME_TYPE last_frame_type; /* Save last frame's frame type for loopfilter init checking and motion search. */
FRAME_TYPE last_frame_type; /* Save last frame's frame type for motion search. */
FRAME_TYPE frame_type;
int show_frame;
@@ -119,7 +127,7 @@ typedef struct VP8Common
/* profile settings */
int mb_no_coeff_skip;
int no_lpf;
int simpler_lpf;
int use_bilinear_mc_filter;
int full_pixel;
int base_qindex;
@@ -139,16 +147,15 @@ typedef struct VP8Common
MODE_INFO *mip; /* Base of allocated array */
MODE_INFO *mi; /* Corresponds to upper left visible macroblock */
MODE_INFO *prev_mip; /* MODE_INFO array 'mip' from last decoded frame */
MODE_INFO *prev_mi; /* 'mi' from last frame (points into prev_mip) */
INTERPOLATIONFILTERTYPE mcomp_filter_type;
LOOPFILTERTYPE last_filter_type;
LOOPFILTERTYPE filter_type;
loop_filter_info lf_info[MAX_LOOP_FILTER+1];
prototype_loopfilter_block((*lf_mbv));
prototype_loopfilter_block((*lf_mbh));
prototype_loopfilter_block((*lf_bv));
prototype_loopfilter_block((*lf_bh));
loop_filter_info_n lf_info;
int filter_level;
int last_sharpness_level;
int sharpness_level;
@@ -195,13 +202,12 @@ typedef struct VP8Common
#if CONFIG_RUNTIME_CPU_DETECT
VP8_COMMON_RTCD rtcd;
#endif
#if CONFIG_MULTITHREAD
int processor_core_count;
#endif
#if CONFIG_POSTPROC
struct postproc_state postproc_state;
#endif
} VP8_COMMON;
int vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int filter_level);
void vp8_init_loop_filter(VP8_COMMON *cm);
void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type);
extern void vp8_loop_filter_frame(VP8_COMMON *cm, MACROBLOCKD *mbd, int filt_val);
#endif

View File

@@ -18,10 +18,12 @@
extern "C"
{
#endif
#include "vpx/vpx_codec.h"
#include "type_aliases.h"
#include "vpx_scale/yv12config.h"
#include "ppflags.h"
#include "vpx_ports/mem.h"
#include "vpx/vpx_codec.h"
typedef void *VP8D_PTR;
typedef struct
@@ -31,6 +33,8 @@ extern "C"
int Version;
int postprocess;
int max_threads;
int error_concealment;
int input_partition;
} VP8D_CONFIG;
typedef enum
{
@@ -50,11 +54,11 @@ extern "C"
int vp8dx_get_setting(VP8D_PTR comp, VP8D_SETTING oxst);
int vp8dx_receive_compressed_data(VP8D_PTR comp, unsigned long size, const unsigned char *dest, INT64 time_stamp);
int vp8dx_get_raw_frame(VP8D_PTR comp, YV12_BUFFER_CONFIG *sd, INT64 *time_stamp, INT64 *time_end_stamp, vp8_ppflags_t *flags);
int vp8dx_receive_compressed_data(VP8D_PTR comp, unsigned long size, const unsigned char *dest, int64_t time_stamp);
int vp8dx_get_raw_frame(VP8D_PTR comp, YV12_BUFFER_CONFIG *sd, int64_t *time_stamp, int64_t *time_end_stamp, vp8_ppflags_t *flags);
int vp8dx_get_reference(VP8D_PTR comp, VP8_REFFRAME ref_frame_flag, YV12_BUFFER_CONFIG *sd);
int vp8dx_set_reference(VP8D_PTR comp, VP8_REFFRAME ref_frame_flag, YV12_BUFFER_CONFIG *sd);
vpx_codec_err_t vp8dx_get_reference(VP8D_PTR comp, VP8_REFFRAME ref_frame_flag, YV12_BUFFER_CONFIG *sd);
vpx_codec_err_t vp8dx_set_reference(VP8D_PTR comp, VP8_REFFRAME ref_frame_flag, YV12_BUFFER_CONFIG *sd);
VP8D_PTR vp8dx_create_decompressor(VP8D_CONFIG *oxcf);

View File

@@ -1,233 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "../../decoder/onyxd_int.h"
#include "../../../vpx_ports/config.h"
#include "../../common/idct.h"
#include "blockd_cl.h"
#include "../../decoder/opencl/dequantize_cl.h"
int vp8_cl_mb_prep(MACROBLOCKD *x, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
//Copy all blockd.cl_*_mem objects
if (flags & DIFF)
VP8_CL_SET_BUF(x->cl_commands, x->cl_diff_mem, sizeof(cl_short)*400, x->diff,
,err
);
if (flags & PREDICTOR)
VP8_CL_SET_BUF(x->cl_commands, x->cl_predictor_mem, sizeof(cl_uchar)*384, x->predictor,
,err
);
if (flags & QCOEFF)
VP8_CL_SET_BUF(x->cl_commands, x->cl_qcoeff_mem, sizeof(cl_short)*400, x->qcoeff,
,err
);
if (flags & DQCOEFF)
VP8_CL_SET_BUF(x->cl_commands, x->cl_dqcoeff_mem, sizeof(cl_short)*400, x->dqcoeff,
,err
);
if (flags & EOBS)
VP8_CL_SET_BUF(x->cl_commands, x->cl_eobs_mem, sizeof(cl_char)*25, x->eobs,
,err
);
if (flags & PRE_BUF){
VP8_CL_SET_BUF(x->cl_commands, x->pre.buffer_mem, x->pre.buffer_size, x->pre.buffer_alloc,
,err
);
}
if (flags & DST_BUF){
VP8_CL_SET_BUF(x->cl_commands, x->dst.buffer_mem, x->dst.buffer_size, x->dst.buffer_alloc,
,err
);
}
return CL_SUCCESS;
}
int vp8_cl_mb_finish(MACROBLOCKD *x, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
if (flags & DIFF){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_diff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->diff, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & PREDICTOR){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_predictor_mem, CL_FALSE, 0, sizeof(cl_uchar)*384, x->predictor, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & QCOEFF){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_qcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->qcoeff, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DQCOEFF){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_dqcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, x->dqcoeff, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & EOBS){
err = clEnqueueReadBuffer(x->cl_commands, x->cl_eobs_mem, CL_FALSE, 0, sizeof(cl_char)*25, x->eobs, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & PRE_BUF){
err = clEnqueueReadBuffer(x->cl_commands, x->pre.buffer_mem, CL_FALSE,
0, x->pre.buffer_size, x->pre.buffer_alloc, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DST_BUF){
err = clEnqueueReadBuffer(x->cl_commands, x->dst.buffer_mem, CL_FALSE,
0, x->dst.buffer_size, x->dst.buffer_alloc, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( x->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
return CL_SUCCESS;
}
int vp8_cl_block_prep(BLOCKD *b, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
//Copy all blockd.cl_*_mem objects
if (flags & DIFF)
VP8_CL_SET_BUF(b->cl_commands, b->cl_diff_mem, sizeof(cl_short)*400, b->diff_base,
,err
);
if (flags & PREDICTOR)
VP8_CL_SET_BUF(b->cl_commands, b->cl_predictor_mem, sizeof(cl_uchar)*384, b->predictor_base,
,err
);
if (flags & QCOEFF)
VP8_CL_SET_BUF(b->cl_commands, b->cl_qcoeff_mem, sizeof(cl_short)*400, b->qcoeff_base,
,err
);
if (flags & DQCOEFF)
VP8_CL_SET_BUF(b->cl_commands, b->cl_dqcoeff_mem, sizeof(cl_short)*400, b->dqcoeff_base,
,err
);
if (flags & EOBS)
VP8_CL_SET_BUF(b->cl_commands, b->cl_eobs_mem, sizeof(cl_char)*25, b->eobs_base,
,err
);
if (flags & DEQUANT)
VP8_CL_SET_BUF(b->cl_commands, b->cl_dequant_mem, sizeof(cl_short)*16 ,b->dequant,
,err
);
return CL_SUCCESS;
}
int vp8_cl_block_finish(BLOCKD *b, int flags){
int err;
if (cl_initialized != CL_SUCCESS){
return cl_initialized;
}
if (flags & DIFF){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_diff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->diff_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & PREDICTOR){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_predictor_mem, CL_FALSE, 0, sizeof(cl_uchar)*384, b->predictor_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & QCOEFF){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_qcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->qcoeff_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DQCOEFF){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_dqcoeff_mem, CL_FALSE, 0, sizeof(cl_short)*400, b->dqcoeff_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & EOBS){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_eobs_mem, CL_FALSE, 0, sizeof(cl_char)*25, b->eobs_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
if (flags & DEQUANT){
err = clEnqueueReadBuffer(b->cl_commands, b->cl_dequant_mem, CL_FALSE, 0, sizeof(cl_short)*16 ,b->dequant, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read from GPU!\n",
, err
);
}
return CL_SUCCESS;
}

View File

@@ -1,64 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef BLOCKD_OPENCL_H
#define BLOCKD_OPENCL_H
#ifdef __cplusplus
extern "C" {
#endif
#include "vp8_opencl.h"
#include "../blockd.h"
#define DIFF 0x0001
#define PREDICTOR 0x0002
#define QCOEFF 0x0004
#define DQCOEFF 0x0008
#define EOBS 0x0010
#define DEQUANT 0x0020
#define PRE_BUF 0x0040
#define DST_BUF 0x0080
#define BLOCK_COPY_ALL 0xffff
/*
#define BLOCK_MEM_SIZE 6
enum {
DIFF_MEM = 0,
PRED_MEM = 1,
QCOEFF_MEM = 2,
DQCOEFF_MEM = 3,
EOBS_MEM = 4,
DEQUANT_MEM = 5
} BLOCK_MEM_TYPES;
struct cl_block_mem{
cl_mem gpu_mem;
size_t size;
void *host_mem;
};
typedef struct cl_block_mem block_mem;
*/
extern int vp8_cl_block_finish(BLOCKD *b, int flags);
extern int vp8_cl_block_prep(BLOCKD *b, int flags);
extern int vp8_cl_mb_prep(MACROBLOCKD *x, int flags);
extern int vp8_cl_mb_finish(MACROBLOCKD *x, int flags);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -1,106 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vp8_opencl.h"
#include <stdio.h>
CL_FUNCTIONS cl;
void *dll = NULL;
int cl_loaded = VP8_CL_NOT_INITIALIZED;
int close_cl(){
int ret = dlclose(dll);
if (ret != 0)
fprintf(stderr, "Error closing OpenCL library: %s", dlerror());
return ret;
}
int load_cl(char *lib_name){
//printf("Loading OpenCL library\n");
dll = dlopen(lib_name, RTLD_NOW|RTLD_LOCAL);
if (dll != NULL){
//printf("Found CL library\n");
} else {
//printf("Didn't find CL library\n");
return VP8_CL_TRIED_BUT_FAILED;
}
CL_LOAD_FN("clGetPlatformIDs", cl.getPlatformIDs);
CL_LOAD_FN("clGetPlatformInfo", cl.getPlatformInfo);
CL_LOAD_FN("clGetDeviceIDs", cl.getDeviceIDs);
CL_LOAD_FN("clGetDeviceInfo", cl.getDeviceInfo);
CL_LOAD_FN("clCreateContext", cl.createContext);
// CL_LOAD_FN("clCreateContextFromType", cl.createContextFromType);
// CL_LOAD_FN("clRetainContext", cl.retainContext);
CL_LOAD_FN("clReleaseContext", cl.releaseContext);
// CL_LOAD_FN("clGetContextInfo", cl.getContextInfo);
CL_LOAD_FN("clCreateCommandQueue", cl.createCommandQueue);
// CL_LOAD_FN("clRetainCommandQueue", cl.retainCommandQueue);
CL_LOAD_FN("clReleaseCommandQueue", cl.releaseCommandQueue);
// CL_LOAD_FN("clGetCommandQueueInfo", cl.getCommandQueue);
CL_LOAD_FN("clCreateBuffer", cl.createBuffer);
// CL_LOAD_FN("clCreateImage2D", cl.createImage2D);
// CL_LOAD_FN("clCreateImage3D", cl.createImage3D);
// CL_LOAD_FN("clRetainMemObject", cl.retainMemObject);
CL_LOAD_FN("clReleaseMemObject", cl.releaseMemObject);
// CL_LOAD_FN("clGetSupportedImageFormats", cl.getSupportedImageFormats);
// CL_LOAD_FN("clGetMemObjectInfo", cl.getMemObjectInfo);
// CL_LOAD_FN("clGetImageInfo", cl.getImageInfo);
// CL_LOAD_FN("clCreateSampler", cl.createSampler);
// CL_LOAD_FN("clRetainSampler", cl.retainSampler);
// CL_LOAD_FN("clReleaseSampler", cl.releaseSampler);
// CL_LOAD_FN("clGetSamplerInfo", cl.getSamplerInfo);
CL_LOAD_FN("clCreateProgramWithSource", cl.createProgramWithSource);
// CL_LOAD_FN("clCreateProgramWithBinary", cl.createProgramWithBinary);
// CL_LOAD_FN("clRetainProgram", cl.retainProgram);
CL_LOAD_FN("clReleaseProgram", cl.releaseProgram);
CL_LOAD_FN("clBuildProgram", cl.buildProgram);
// CL_LOAD_FN("clUnloadCompiler", cl.unloadCompiler);
CL_LOAD_FN("clGetProgramInfo", cl.getProgramInfo);
CL_LOAD_FN("clGetProgramBuildInfo", cl.getProgramBuildInfo);
CL_LOAD_FN("clCreateKernel", cl.createKernel);
// CL_LOAD_FN("clCreateKernelsInProgram", cl.createKernelsInProgram);
// CL_LOAD_FN("clRetainKernel", cl.retainKernel);
CL_LOAD_FN("clReleaseKernel", cl.releaseKernel);
CL_LOAD_FN("clSetKernelArg", cl.setKernelArg);
// CL_LOAD_FN("clGetKernelInfo", cl.getKernelInfo);
CL_LOAD_FN("clGetKernelWorkGroupInfo", cl.getKernelWorkGroupInfo);
// CL_LOAD_FN("clWaitForEvents", cl.waitForEvents);
// CL_LOAD_FN("clGetEventInfo", cl.getEventInfo);
// CL_LOAD_FN("clRetainEvent", cl.retainEvent);
// CL_LOAD_FN("clReleaseEvent", cl.releaseEvent);
// CL_LOAD_FN("clGetEventProfilingInfo", cl.getEventProfilingInfo);
CL_LOAD_FN("clFlush", cl.flush);
CL_LOAD_FN("clFinish", cl.finish);
CL_LOAD_FN("clEnqueueReadBuffer", cl.enqueueReadBuffer);
CL_LOAD_FN("clEnqueueWriteBuffer", cl.enqueueWriteBuffer);
CL_LOAD_FN("clEnqueueCopyBuffer", cl.enqueueCopyBuffer);
// CL_LOAD_FN("clEnqueueReadImage", cl.enqueueReadImage);
// CL_LOAD_FN("clEnqueueWriteImage", cl.enqueueWriteImage);
// CL_LOAD_FN("clEnqueueCopyImage", cl.enqueueCopyImage);
// CL_LOAD_FN("clEnqueueCopyImageToBuffer", cl.enqueueCopyImageToBuffer);
// CL_LOAD_FN("clEnqueueCopyBufferToImage", cl.enqueueCopyBufferToImage);
// CL_LOAD_FN("clEnqueueMapBuffer", cl.enqueueMapBuffer);
// CL_LOAD_FN("clEnqueueMapImage", cl.enqueueMapImage);
// CL_LOAD_FN("clEnqueueUnmapMemObject", cl.enqueueUnmapMemObject);
CL_LOAD_FN("clEnqueueNDRangeKernel", cl.enqueueNDRAngeKernel);
// CL_LOAD_FN("clEnqueueTask", cl.enqueueTask);
// CL_LOAD_FN("clEnqueueNativeKernel", cl.enqueueNativeKernel);
// CL_LOAD_FN("clEnqueueMarker", cl.enqueueMarker);
// CL_LOAD_FN("clEnqueueWaitForEvents", cl.enqueueWaitForEvents);
CL_LOAD_FN("clEnqueueBarrier", cl.enqueueBarrier);
// CL_LOAD_FN("clGetExtensionFunctionAddress", cl.getExtensionFunctionAddress);
return CL_SUCCESS;
}

View File

@@ -1,253 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef DYNAMIC_CL_H
#define DYNAMIC_CL_H
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
#include <dlfcn.h>
int load_cl(char *lib_name);
int close_cl();
extern int cl_loaded;
typedef cl_int(*fn_clGetPlatformIDs_t)(cl_uint, cl_platform_id *, cl_uint *);
typedef cl_int(*fn_clGetPlatformInfo_t)(cl_platform_id, cl_platform_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetDeviceIDs_t)(cl_platform_id, cl_device_type, cl_uint, cl_device_id *, cl_uint *);
typedef cl_int(*fn_clGetDeviceInfo_t)(cl_device_id, cl_device_info, size_t, void *, size_t *);
typedef cl_context(*fn_clCreateContext_t)(const cl_context_properties *, cl_uint, const cl_device_id *, void (*pfn_notify)(const char *, const void *, size_t, void *), void *, cl_int *);
typedef cl_context(*fn_clCreateContextFromType_t)(const cl_context_properties *, cl_device_type, void (*pfn_notify)(const char *, const void *, size_t, void *), void *, cl_int *);
typedef cl_int(*fn_clRetainContext_t)(cl_context);
typedef cl_int(*fn_clReleaseContext_t)(cl_context);
typedef cl_int(*fn_clGetContextInfo_t)(cl_context, cl_context_info, size_t, void *, size_t *);
typedef cl_command_queue(*fn_clCreateCommandQueue_t)(cl_context, cl_device_id, cl_command_queue_properties, cl_int *);
typedef cl_int(*fn_clRetainCommandQueue_t)(cl_command_queue);
typedef cl_int(*fn_clReleaseCommandQueue_t)(cl_command_queue);
typedef cl_int(*fn_clGetCommandQueueInfo_t)(cl_command_queue, cl_command_queue_info, size_t, void *, size_t *);
typedef cl_mem(*fn_clCreateBuffer_t)(cl_context, cl_mem_flags, size_t, void *, cl_int *);
typedef cl_mem(*fn_clCreateImage2D_t)(cl_context, cl_mem_flags, const cl_image_format *, size_t, size_t, size_t, void *, cl_int *);
typedef cl_mem(*fn_clCreateImage3D_t)(cl_context, cl_mem_flags, const cl_image_format *, size_t, size_t, size_t, size_t, size_t, void *, cl_int *);
typedef cl_int(*fn_clRetainMemObject_t)(cl_mem);
typedef cl_int(*fn_clReleaseMemObject_t)(cl_mem);
typedef cl_int(*fn_clGetSupportedImageFormats_t)(cl_context, cl_mem_flags, cl_mem_object_type, cl_uint, cl_image_format *, cl_uint *);
typedef cl_int(*fn_clGetMemObjectInfo_t)(cl_mem, cl_mem_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetImageInfo_t)(cl_mem, cl_image_info, size_t, void *, size_t *);
typedef cl_sampler(*fn_clCreateSampler_t)(cl_context, cl_bool, cl_addressing_mode, cl_filter_mode, cl_int *);
typedef cl_int(*fn_clRetainSampler_t)(cl_sampler);
typedef cl_int(*fn_clReleaseSampler_t)(cl_sampler);
typedef cl_int(*fn_clGetSamplerInfo_t)(cl_sampler, cl_sampler_info, size_t, void *, size_t *);
typedef cl_program(*fn_clCreateProgramWithSource_t)(cl_context, cl_uint, const char **, const size_t *, cl_int *);
typedef cl_program(*fn_clCreateProgramWithBinary_t)(cl_context, cl_uint, const cl_device_id *, const size_t *, const unsigned char **, cl_int *, cl_int *);
typedef cl_int(*fn_clRetainProgram_t)(cl_program);
typedef cl_int(*fn_clReleaseProgram_t)(cl_program);
typedef cl_int(*fn_clBuildProgram_t)(cl_program, cl_uint, const cl_device_id *, const char *, void (*pfn_notify)(cl_program,void*), void *);
typedef cl_int(*fn_clUnloadCompiler_t)(void);
typedef cl_int(*fn_clGetProgramInfo_t)(cl_program, cl_program_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetProgramBuildInfo_t)(cl_program, cl_device_id, cl_program_build_info, size_t, void *, size_t *);
typedef cl_kernel(*fn_clCreateKernel_t)(cl_program, const char *, cl_int *);
typedef cl_int(*fn_clCreateKernelsInProgram_t)(cl_program, cl_uint, cl_kernel *, cl_uint *);
typedef cl_int(*fn_clRetainKernel_t)(cl_kernel);
typedef cl_int(*fn_clReleaseKernel_t)(cl_kernel);
typedef cl_int(*fn_clSetKernelArg_t)(cl_kernel, cl_uint, size_t, const void *);
typedef cl_int(*fn_clGetKernelInfo_t)(cl_kernel, cl_kernel_info, size_t, void *, size_t *);
typedef cl_int(*fn_clGetKernelWorkGroupInfo_t)(cl_kernel, cl_device_id, cl_kernel_work_group_info, size_t, void *, size_t *);
typedef cl_int(*fn_clWaitForEvents_t)(cl_uint, const cl_event *);
typedef cl_int(*fn_clGetEventInfo_t)(cl_event, cl_event_info, size_t, void *, size_t *);
typedef cl_int(*fn_clRetainEvent_t)(cl_event);
typedef cl_int(*fn_clReleaseEvent_t)(cl_event);
typedef cl_int(*fn_clGetEventProfilingInfo_t)(cl_event, cl_profiling_info, size_t, void *, size_t *);
typedef cl_int(*fn_clFlush_t)(cl_command_queue);
typedef cl_int(*fn_clFinish_t)(cl_command_queue);
typedef cl_int(*fn_clEnqueueReadBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueWriteBuffer_t)(cl_command_queue, cl_mem, cl_bool, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyBuffer_t)(cl_command_queue, cl_mem, cl_mem, size_t, size_t, size_t, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueReadImage_t)(cl_command_queue, cl_mem, cl_bool, const size_t *, const size_t *, size_t, size_t, void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueWriteImage_t)(cl_command_queue, cl_mem, cl_bool, const size_t *, const size_t *, size_t, size_t, const void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyImage_t)(cl_command_queue, cl_mem, cl_mem, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyImageToBuffer_t)(cl_command_queue, cl_mem, cl_mem, const size_t *, const size_t *, size_t, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueCopyBufferToImage_t)(cl_command_queue, cl_mem, cl_mem, size_t, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
typedef void*(*fn_clEnqueueMapBuffer_t)(cl_command_queue, cl_mem, cl_bool, cl_map_flags, size_t, size_t, cl_uint, const cl_event *, cl_event *, cl_int *);
typedef void*(*fn_clEnqueueMapImage_t)(cl_command_queue, cl_mem, cl_bool, cl_map_flags, const size_t *, const size_t *, size_t *, size_t *, cl_uint, const cl_event *, cl_event *, cl_int *);
typedef cl_int(*fn_clEnqueueUnmapMemObject_t)(cl_command_queue, cl_mem, void *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueNDRangeKernel_t)(cl_command_queue, cl_kernel, cl_uint, const size_t *, const size_t *, const size_t *, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueTask_t)(cl_command_queue, cl_kernel, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueNativeKernel_t)(cl_command_queue, void (*user_func)(void *), void *, size_t, cl_uint, const cl_mem *, const void **, cl_uint, const cl_event *, cl_event *);
typedef cl_int(*fn_clEnqueueMarker_t)(cl_command_queue, cl_event *);
typedef cl_int(*fn_clEnqueueWaitForEvents_t)(cl_command_queue, cl_uint, const cl_event *);
typedef cl_int(*fn_clEnqueueBarrier_t)(cl_command_queue);
typedef void*(*fn_clGetExtensionFunctionAddress_t)(const char *);
typedef struct CL_FUNCTIONS {
fn_clGetPlatformIDs_t getPlatformIDs;
fn_clGetPlatformInfo_t getPlatformInfo;
fn_clGetDeviceIDs_t getDeviceIDs;
fn_clGetDeviceInfo_t getDeviceInfo;
fn_clCreateContext_t createContext;
fn_clCreateContextFromType_t createContextFromType;
fn_clRetainContext_t retainContext;
fn_clReleaseContext_t releaseContext;
fn_clGetContextInfo_t getContextInfo;
fn_clCreateCommandQueue_t createCommandQueue;
fn_clRetainCommandQueue_t retainCommandQueue;
fn_clReleaseCommandQueue_t releaseCommandQueue;
fn_clGetCommandQueueInfo_t getCommandQueue;
fn_clCreateBuffer_t createBuffer;
fn_clCreateImage2D_t createImage2D;
fn_clCreateImage3D_t createImage3D;
fn_clRetainMemObject_t retainMemObject;
fn_clReleaseMemObject_t releaseMemObject;
fn_clGetSupportedImageFormats_t getSupportedImageFormats;
fn_clGetMemObjectInfo_t getMemObjectInfo;
fn_clGetImageInfo_t getImageInfo;
fn_clCreateSampler_t createSampler;
fn_clRetainSampler_t retainSampler;
fn_clReleaseSampler_t releaseSampler;
fn_clGetSamplerInfo_t getSamplerInfo;
fn_clCreateProgramWithSource_t createProgramWithSource;
fn_clCreateProgramWithBinary_t createProgramWithBinary;
fn_clRetainProgram_t retainProgram;
fn_clReleaseProgram_t releaseProgram;
fn_clBuildProgram_t buildProgram;
fn_clUnloadCompiler_t unloadCompiler;
fn_clGetProgramInfo_t getProgramInfo;
fn_clGetProgramBuildInfo_t getProgramBuildInfo;
fn_clCreateKernel_t createKernel;
fn_clCreateKernelsInProgram_t createKernelsInProgram;
fn_clRetainKernel_t retainKernel;
fn_clReleaseKernel_t releaseKernel;
fn_clSetKernelArg_t setKernelArg;
fn_clGetKernelInfo_t getKernelInfo;
fn_clGetKernelWorkGroupInfo_t getKernelWorkGroupInfo;
fn_clWaitForEvents_t waitForEvents;
fn_clGetEventInfo_t getEventInfo;
fn_clRetainEvent_t retainEvent;
fn_clReleaseEvent_t releaseEvent;
fn_clGetEventProfilingInfo_t getEventProfilingInfo;
fn_clFlush_t flush;
fn_clFinish_t finish;
fn_clEnqueueReadBuffer_t enqueueReadBuffer;
fn_clEnqueueWriteBuffer_t enqueueWriteBuffer;
fn_clEnqueueCopyBuffer_t enqueueCopyBuffer;
fn_clEnqueueReadImage_t enqueueReadImage;
fn_clEnqueueWriteImage_t enqueueWriteImage;
fn_clEnqueueCopyImage_t enqueueCopyImage;
fn_clEnqueueCopyImageToBuffer_t enqueueCopyImageToBuffer;
fn_clEnqueueCopyBufferToImage_t enqueueCopyBufferToImage;
fn_clEnqueueMapBuffer_t enqueueMapBuffer;
fn_clEnqueueMapImage_t enqueueMapImage;
fn_clEnqueueUnmapMemObject_t enqueueUnmapMemObject;
fn_clEnqueueNDRangeKernel_t enqueueNDRAngeKernel;
fn_clEnqueueTask_t enqueueTask;
fn_clEnqueueNativeKernel_t enqueueNativeKernel;
fn_clEnqueueMarker_t enqueueMarker;
fn_clEnqueueWaitForEvents_t enqueueWaitForEvents;
fn_clEnqueueBarrier_t enqueueBarrier;
fn_clGetExtensionFunctionAddress_t getExtensionFunctionAddress;
} CL_FUNCTIONS;
extern CL_FUNCTIONS cl;
#define clGetPlatformIDs cl.getPlatformIDs
#define clGetPlatformInfo cl.getPlatformInfo
#define clGetDeviceIDs cl.getDeviceIDs
#define clGetDeviceInfo cl.getDeviceInfo
#define clCreateContext cl.createContext
#define clCreateContextFromType cl.createContextFromType
#define clRetainContext cl.retainContext
#define clReleaseContext cl.releaseContext
#define clGetContextInfo cl.getContextInfo
#define clCreateCommandQueue cl.createCommandQueue
#define clRetainCommandQueue cl.retainCommandQueue
#define clReleaseCommandQueue cl.releaseCommandQueue
#define clGetCommandQueueInfo cl.getCommandQueue
#define clCreateBuffer cl.createBuffer
#define clCreateSubBuffer cl.createSubBuffer
#define clCreateImage2D cl.createImage2D
#define clCreateImage3D cl.createImage3D
#define clRetainMemObject cl.retainMemObject
#define clReleaseMemObject cl.releaseMemObject
#define clGetSupportedImageFormats cl.getSupportedImageFormats
#define clGetMemObjectInfo cl.getMemObjectInfo
#define clGetImageInfo cl.getImageInfo
#define clSetMemObjectDestructorCallback cl.setMemObjectDestructorCallback
#define clCreateSampler cl.createSampler
#define clRetainSampler cl.retainSampler
#define clReleaseSampler cl.releaseSampler
#define clGetSamplerInfo cl.getSamplerInfo
#define clCreateProgramWithSource cl.createProgramWithSource
#define clCreateProgramWithBinary cl.createProgramWithBinary
#define clRetainProgram cl.retainProgram
#define clReleaseProgram cl.releaseProgram
#define clBuildProgram cl.buildProgram
#define clUnloadCompiler cl.unloadCompiler
#define clGetProgramInfo cl.getProgramInfo
#define clGetProgramBuildInfo cl.getProgramBuildInfo
#define clCreateKernel cl.createKernel
#define clCreateKernelsInProgram cl.createKernelsInProgram
#define clRetainKernel cl.retainKernel
#define clReleaseKernel cl.releaseKernel
#define clSetKernelArg cl.setKernelArg
#define clGetKernelInfo cl.getKernelInfo
#define clGetKernelWorkGroupInfo cl.getKernelWorkGroupInfo
#define clWaitForEvents cl.waitForEvents
#define clGetEventInfo cl.getEventInfo
#define clCreateUserEvent cl.createUserEvent
#define clRetainEvent cl.retainEvent
#define clReleaseEvent cl.releaseEvent
#define clSetUserEventStatus cl.setUserEventStatus
#define clSetEventCallback cl.setEventCallback
#define clGetEventProfilingInfo cl.getEventProfilingInfo
#define clFlush cl.flush
#define clFinish cl.finish
#define clEnqueueReadBuffer cl.enqueueReadBuffer
#define clEnqueueReadBufferRect cl.enqueueReadBufferRect
#define clEnqueueWriteBuffer cl.enqueueWriteBuffer
#define clEnqueueWriteBufferRect cl.enqueueWriteBufferRect
#define clEnqueueCopyBuffer cl.enqueueCopyBuffer
#define clEnqueueCopyBufferRect cl.enqueueCopyBufferRect
#define clEnqueueReadImage cl.enqueueReadImage
#define clEnqueueWriteImage cl.enqueueWriteImage
#define clEnqueueCopyImage cl.enqueueCopyImage
#define clEnqueueCopyImageToBuffer cl.enqueueCopyImageToBuffer
#define clEnqueueCopyBufferToImage cl.enqueueCopyBufferToImage
#define clEnqueueMapBuffer cl.enqueueMapBuffer
#define clEnqueueMapImage cl.enqueueMapImage
#define clEnqueueUnmapMemObject cl.enqueueUnmapMemObject
#define clEnqueueNDRangeKernel cl.enqueueNDRAngeKernel
#define clEnqueueTask cl.enqueueTask
#define clEnqueueNativeKernel cl.enqueueNativeKernel
#define clEnqueueMarker cl.enqueueMarker
#define clEnqueueWaitForEvents cl.enqueueWaitForEvents
#define clEnqueueBarrier cl.enqueueBarrier
#define clGetExtensionFunctionAddress cl.getExtensionFunctionAddress
#define CL_LOAD_FN(name, ref) \
ref = dlsym(dll,name); \
if (ref == NULL){ \
dlclose(dll); \
return CL_INVALID_PLATFORM; \
}
#ifdef __cplusplus
}
#endif
#endif /* DYNAMIC_CL_H */

View File

@@ -1,824 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
//ACW: Remove me after debugging.
#include <stdio.h>
#include <string.h>
#include "vp8_opencl.h"
#include "filter_cl.h"
#include "../blockd.h"
#define SIXTAP_FILTER_LEN 6
const char *filterCompileOptions = "-Ivp8/common/opencl -DVP8_FILTER_WEIGHT=128 -DVP8_FILTER_SHIFT=7 -DFILTER_OFFSET";
const char *filter_cl_file_name = "vp8/common/opencl/filter_cl.cl";
#define STATIC_MEM 1
#if STATIC_MEM
static cl_mem int_mem = NULL;
#endif
void cl_destroy_filter(){
if (cl_data.filter_program)
clReleaseProgram(cl_data.filter_program);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_block_variation_kernel);
#if !TWO_PASS_SIXTAP
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict8x8_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict8x4_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_sixtap_predict16x16_kernel);
#else
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_first_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_second_pass_kernel);
#endif
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict4x4_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict8x4_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict8x8_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_bilinear_predict16x16_kernel);
#if MEM_COPY_KERNEL
VP8_CL_RELEASE_KERNEL(cl_data.vp8_memcpy_kernel);
#endif
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_bil_first_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_filter_block2d_bil_second_pass_kernel);
#if STATIC_MEM
if (int_mem != NULL)
clReleaseMemObject(int_mem);
int_mem = NULL;
#endif
cl_data.filter_program = NULL;
}
int cl_init_filter() {
int err;
// Create the filter compute program from the file-defined source code
if ( cl_load_program(&cl_data.filter_program, filter_cl_file_name,
filterCompileOptions) != CL_SUCCESS )
return VP8_CL_TRIED_BUT_FAILED;
// Create the compute kernel in the program we wish to run
#if TWO_PASS_SIXTAP
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_first_pass_kernel,"vp8_filter_block2d_first_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_second_pass_kernel,"vp8_filter_block2d_second_pass_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_first_pass_kernel,vp8_filter_block2d_first_pass_kernel_size);
VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_second_pass_kernel,vp8_filter_block2d_second_pass_kernel_size);
#else
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict_kernel,"vp8_sixtap_predict_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict_kernel,vp8_sixtap_predict_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict8x8_kernel,"vp8_sixtap_predict8x8_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict8x8_kernel,vp8_sixtap_predict8x8_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict8x4_kernel,"vp8_sixtap_predict8x4_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict8x4_kernel,vp8_sixtap_predict8x4_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_sixtap_predict16x16_kernel,"vp8_sixtap_predict16x16_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_sixtap_predict16x16_kernel,vp8_sixtap_predict16x16_kernel_size);
#endif
//VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_bil_first_pass_kernel,vp8_filter_block2d_bil_first_pass_kernel_size);
//VP8_CL_CALC_LOCAL_SIZE(vp8_filter_block2d_bil_second_pass_kernel,vp8_filter_block2d_bil_second_pass_kernel_size);
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_bil_first_pass_kernel,"vp8_filter_block2d_bil_first_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_filter_block2d_bil_second_pass_kernel,"vp8_filter_block2d_bil_second_pass_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict4x4_kernel,"vp8_bilinear_predict4x4_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict8x4_kernel,"vp8_bilinear_predict8x4_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict8x8_kernel,"vp8_bilinear_predict8x8_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_bilinear_predict16x16_kernel,"vp8_bilinear_predict16x16_kernel");
#if MEM_COPY_KERNEL
VP8_CL_CREATE_KERNEL(cl_data,filter_program,vp8_memcpy_kernel,"vp8_memcpy_kernel");
VP8_CL_CALC_LOCAL_SIZE(vp8_memcpy_kernel,vp8_memcpy_kernel_size);
#endif
#if STATIC_MEM
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,err);
#endif
return CL_SUCCESS;
}
void vp8_filter_block2d_first_pass_cl(
cl_command_queue cq,
cl_mem src_mem,
int src_offset,
cl_mem int_mem,
unsigned int src_pixels_per_line,
unsigned int int_height,
unsigned int int_width,
int xoffset
){
int err;
size_t global = int_width*int_height;
size_t local = cl_data.vp8_filter_block2d_first_pass_kernel_size;
if (local > global)
local = global;
err = clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 1, sizeof (int), &src_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 2, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 3, sizeof (cl_uint), &src_pixels_per_line);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 4, sizeof (cl_uint), &int_height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 5, sizeof (cl_int), &int_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_first_pass_kernel, 6, sizeof (int), &xoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_first_pass_kernel, 1, NULL, &global, &local , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
}
void vp8_filter_block2d_second_pass_cl(
cl_command_queue cq,
cl_mem int_mem,
int int_offset,
cl_mem dst_mem,
int dst_offset,
int dst_pitch,
unsigned int output_height,
unsigned int output_width,
int yoffset
){
int err;
size_t global = output_width*output_height;
size_t local = cl_data.vp8_filter_block2d_second_pass_kernel_size;
if (local > global){
//printf("Local is now %ld\n",global);
local = global;
}
/* Set kernel arguments */
err = clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 0, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 1, sizeof (int), &int_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 2, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 3, sizeof (int), &dst_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 4, sizeof (int), &dst_pitch);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 5, sizeof (int), &output_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 6, sizeof (int), &output_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 7, sizeof (int), &output_height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 8, sizeof (int), &output_width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_second_pass_kernel, 9, sizeof (int), &yoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_second_pass_kernel, 1, NULL, &global, &local , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
}
void vp8_sixtap_single_pass(
cl_command_queue cq,
cl_kernel kernel,
size_t local,
size_t global,
cl_mem src_mem,
cl_mem dst_mem,
unsigned char *src_base,
int src_offset,
size_t src_len,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
int dst_offset,
int dst_pitch,
size_t dst_len
){
int err;
#if !STATIC_MEM
cl_mem int_mem;
#endif
int free_src = 0, free_dst = 0;
if (local > global){
local = global;
}
/* Make space for kernel input/output data.
* Initialize the buffer as well if needed.
*/
if (src_mem == NULL){
VP8_CL_CREATE_BUF( cq, src_mem,, sizeof (unsigned char) * src_len, src_base-2,,);
src_offset = 2;
free_src = 1;
} else {
src_offset -= 2*src_pixels_per_line;
}
if (dst_mem == NULL){
VP8_CL_CREATE_BUF( cq, dst_mem,, sizeof (unsigned char) * dst_len + dst_offset, dst_base,, );
free_dst = 1;
}
#if !STATIC_MEM
CL_CREATE_BUF( cq, int_mem,, sizeof(cl_int)*FData_height*FData_width, NULL,, );
#endif
err = clSetKernelArg(kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(kernel, 1, sizeof (int), &src_offset);
err |= clSetKernelArg(kernel, 2, sizeof (cl_int), &src_pixels_per_line);
err |= clSetKernelArg(kernel, 3, sizeof (cl_int), &xoffset);
err |= clSetKernelArg(kernel, 4, sizeof (cl_int), &yoffset);
err |= clSetKernelArg(kernel, 5, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(kernel, 6, sizeof (cl_int), &dst_offset);
err |= clSetKernelArg(kernel, 7, sizeof (int), &dst_pitch);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, kernel, 1, NULL, &global, &local , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_src == 1)
clReleaseMemObject(src_mem);
if (free_dst == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
,
);
clReleaseMemObject(dst_mem);
}
}
void vp8_sixtap_run_cl(
cl_command_queue cq,
cl_mem src_mem,
cl_mem dst_mem,
unsigned char *src_base,
int src_offset,
size_t src_len,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
int dst_offset,
int dst_pitch,
size_t dst_len,
unsigned int FData_height,
unsigned int FData_width,
unsigned int output_height,
unsigned int output_width,
int int_offset
)
{
int err;
#if !STATIC_MEM
cl_mem int_mem;
#endif
int free_src = 0, free_dst = 0;
/* Make space for kernel input/output data.
* Initialize the buffer as well if needed.
*/
if (src_mem == NULL){
VP8_CL_CREATE_BUF( cq, src_mem,, sizeof (unsigned char) * src_len, src_base-2,,);
src_offset = 2;
free_src = 1;
} else {
src_offset -= 2*src_pixels_per_line;
}
if (dst_mem == NULL){
VP8_CL_CREATE_BUF( cq, dst_mem,, sizeof (unsigned char) * dst_len + dst_offset, dst_base,, );
free_dst = 1;
}
#if !STATIC_MEM
CL_CREATE_BUF( cq, int_mem,, sizeof(cl_int)*FData_height*FData_width, NULL,, );
#endif
vp8_filter_block2d_first_pass_cl(
cq, src_mem, src_offset, int_mem, src_pixels_per_line,
FData_height, FData_width, xoffset
);
vp8_filter_block2d_second_pass_cl(cq,int_mem,int_offset,dst_mem,dst_offset,dst_pitch,
output_height,output_width,yoffset);
if (free_src == 1)
clReleaseMemObject(src_mem);
if (free_dst == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
,
);
clReleaseMemObject(dst_mem);
}
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_sixtap_predict4x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=4, output_height=4, FData_height=9, FData_width=4;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 8;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict_kernel,
cl_data.vp8_sixtap_predict_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_sixtap_predict8x8_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=8, output_height=8, FData_height=13, FData_width=8;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 16;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict8x8_kernel,
cl_data.vp8_sixtap_predict8x8_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_sixtap_predict8x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=8, output_height=4, FData_height=9, FData_width=8;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 16;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict8x4_kernel,
cl_data.vp8_sixtap_predict8x4_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_sixtap_predict16x16_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
int output_width=16, output_height=16, FData_height=21, FData_width=16;
//Size of output to transfer
int dst_len = DST_LEN(dst_pitch,output_height,output_width);
int src_len = SIXTAP_SRC_LEN(FData_width,FData_height,src_pixels_per_line);
#if TWO_PASS_SIXTAP
int int_offset = 32;
unsigned char *src_ptr = src_base + src_offset;
vp8_sixtap_run_cl(cq, src_mem, dst_mem,
(src_ptr-2*src_pixels_per_line),src_offset, src_len,
src_pixels_per_line, xoffset,yoffset,dst_base,dst_offset,
dst_pitch,dst_len,FData_height,FData_width,output_height,
output_width,int_offset
);
#else
vp8_sixtap_single_pass(
cq,
cl_data.vp8_sixtap_predict16x16_kernel,
cl_data.vp8_sixtap_predict16x16_kernel_size,
FData_height*FData_width,
src_mem,
dst_mem,
src_base,
src_offset,
src_len,
src_pixels_per_line,
xoffset,
yoffset,
dst_base,
dst_offset,
dst_pitch,
dst_len
);
#endif
return;
}
void vp8_filter_block2d_bil_first_pass_cl(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
cl_mem int_mem,
int src_pixels_per_line,
int height,
int width,
int xoffset
)
{
int err;
size_t global = width*height;
int free_src = 0;
if (src_mem == NULL){
int src_len = BIL_SRC_LEN(width,height,src_pixels_per_line);
/*Make space for kernel input/output data. Initialize the buffer as well if needed. */
VP8_CL_CREATE_BUF(cq, src_mem, CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,
sizeof (unsigned char) * src_len, src_base+src_offset,,
);
src_offset = 0; //Set to zero as long as src_mem starts at base+offset
free_src = 1;
}
err = clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 1, sizeof (int), &src_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 2, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 3, sizeof (int), &src_pixels_per_line);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 4, sizeof (int), &height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 5, sizeof (int), &width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_first_pass_kernel, 6, sizeof (int), &xoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_bil_first_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_src == 1)
clReleaseMemObject(src_mem);
}
void vp8_filter_block2d_bil_second_pass_cl(
cl_command_queue cq,
cl_mem int_mem,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch,
int height,
int width,
int yoffset
)
{
int err;
size_t global = width*height;
//Size of output data
int dst_len = DST_LEN(dst_pitch,height,width);
int free_dst = 0;
if (dst_mem == NULL){
VP8_CL_CREATE_BUF(cq, dst_mem, CL_MEM_WRITE_ONLY|CL_MEM_COPY_HOST_PTR,
sizeof (unsigned char) * dst_len + dst_offset, dst_base,,
);
free_dst = 1;
}
err = clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 0, sizeof (cl_mem), &int_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 1, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 2, sizeof (int), &dst_offset);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 3, sizeof (int), &dst_pitch);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 4, sizeof (int), &height);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 5, sizeof (int), &width);
err |= clSetKernelArg(cl_data.vp8_filter_block2d_bil_second_pass_kernel, 6, sizeof (int), &yoffset);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_filter_block2d_bil_second_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_dst == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(cq, dst_mem, CL_FALSE, 0, sizeof (unsigned char) * dst_len + dst_offset, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
,
);
clReleaseMemObject(dst_mem);
}
}
void vp8_bilinear_predict4x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 4, width = 4;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_bilinear_predict8x8_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 8, width = 8;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_bilinear_predict8x4_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 4, width = 8;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}
void vp8_bilinear_predict16x16_cl
(
cl_command_queue cq,
unsigned char *src_base,
cl_mem src_mem,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_base,
cl_mem dst_mem,
int dst_offset,
int dst_pitch
) {
const int height = 16, width = 16;
#if !STATIC_MEM
int err;
cl_mem int_mem = NULL;
VP8_CL_CREATE_BUF(NULL, int_mem, NULL, sizeof(cl_int)*21*16, NULL, ,);
#endif
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_cl(cq, src_base, src_mem, src_offset, int_mem, src_pixels_per_line, height + 1, width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_cl(cq, int_mem, dst_base, dst_mem, dst_offset, dst_pitch, height, width, yoffset);
#if !STATIC_MEM
clReleaseMemObject(int_mem);
#endif
}

View File

@@ -1,562 +0,0 @@
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
__constant int bilinear_filters[8][2] = {
{ 128, 0},
{ 112, 16},
{ 96, 32},
{ 80, 48},
{ 64, 64},
{ 48, 80},
{ 32, 96},
{ 16, 112}
};
__constant short sub_pel_filters[8][8] = {
//These were originally 8x6, but are padded for vector ops
{ 0, 0, 128, 0, 0, 0, 0, 0}, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0, 0, 0},
{ 2, -11, 108, 36, -8, 1, 0, 0}, /* New 1/4 pel 6 tap filter */
{ 0, -9, 93, 50, -6, 0, 0, 0},
{ 3, -16, 77, 77, -16, 3, 0, 0}, /* New 1/2 pel 6 tap filter */
{ 0, -6, 50, 93, -9, 0, 0, 0},
{ 1, -8, 36, 108, -11, 2, 0, 0}, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0, 0, 0},
};
kernel void vp8_filter_block2d_first_pass_kernel(
__global unsigned char *src_base,
int src_offset,
__global int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
int filter_offset
){
uint tid = get_global_id(0);
global unsigned char *src_ptr = &src_base[src_offset];
//Note that src_offset will be reset later, which is why we use it now
int Temp;
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (tid < (output_width*output_height)){
src_offset = tid + (tid/output_width * (src_pixels_per_line - output_width));
Temp = (int)(src_ptr[src_offset - 2] * vp8_filter[0]) +
(int)(src_ptr[src_offset - 1] * vp8_filter[1]) +
(int)(src_ptr[src_offset] * vp8_filter[2]) +
(int)(src_ptr[src_offset + 1] * vp8_filter[3]) +
(int)(src_ptr[src_offset + 2] * vp8_filter[4]) +
(int)(src_ptr[src_offset + 3] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if ( Temp > 255 )
Temp = 255;
output_ptr[tid] = Temp;
}
}
kernel void vp8_filter_block2d_second_pass_kernel
(
__global int *src_base,
int src_offset,
__global unsigned char *output_base,
int output_offset,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
int filter_offset
) {
uint i = get_global_id(0);
global int *src_ptr = &src_base[src_offset];
global unsigned char *output_ptr = &output_base[output_offset];
int out_offset; //Not same as output_offset...
int Temp;
int PS2 = 2*(int)pixel_step;
int PS3 = 3*(int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (i < (output_width * output_height)){
out_offset = i/output_width;
src_offset = out_offset;
src_offset = i + (src_offset * src_increment);
out_offset = i%output_width + (out_offset * output_pitch);
/* Apply filter */
Temp = ((int)src_ptr[src_offset - PS2] * vp8_filter[0]) +
((int)src_ptr[src_offset -(int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[src_offset] * vp8_filter[2]) +
((int)src_ptr[src_offset + pixel_step] * vp8_filter[3]) +
((int)src_ptr[src_offset + PS2] * vp8_filter[4]) +
((int)src_ptr[src_offset + PS3] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[out_offset] = (unsigned char)Temp;
}
}
kernel void vp8_filter_block2d_bil_first_pass_kernel(
__global unsigned char *src_base,
int src_offset,
__global int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
int filter_offset
)
{
uint tid = get_global_id(0);
if (tid < output_width * output_height){
global unsigned char *src_ptr = &src_base[src_offset];
unsigned int i, j;
__constant int *vp8_filter = bilinear_filters[filter_offset];
unsigned int out_row,out_offset;
int src_increment = src_pixels_per_line - output_width;
i = tid / output_width;
j = tid % output_width;
src_offset = i*(output_width+src_increment) + j;
out_row = output_width * i;
out_offset = out_row + j;
/* Apply bilinear filter */
output_ptr[out_offset] = (((int)src_ptr[src_offset] * vp8_filter[0]) +
((int)src_ptr[src_offset+1] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
}
}
kernel void vp8_filter_block2d_bil_second_pass_kernel
(
__global int *src_ptr,
__global unsigned char *output_base,
int output_offset,
int output_pitch,
unsigned int output_height,
unsigned int output_width,
int filter_offset
)
{
uint tid = get_global_id(0);
if (tid < output_width * output_height){
global unsigned char *output_ptr = &output_base[output_offset];
unsigned int i, j;
int Temp;
__constant int *vp8_filter = bilinear_filters[filter_offset];
int out_offset;
int src_offset;
i = tid / output_width;
j = tid % output_width;
src_offset = i*(output_width) + j;
out_offset = i*output_pitch + j;
/* Apply filter */
Temp = ((int)src_ptr[src_offset] * vp8_filter[0]) +
((int)src_ptr[src_offset+output_width] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2);
output_ptr[out_offset++] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
}
}
//Called from reconinter_cl.c
kernel void vp8_memcpy_kernel(
global unsigned char *src_base,
int src_offset,
int src_stride,
global unsigned char *dst_base,
int dst_offset,
int dst_stride,
int num_bytes,
int num_iter
){
int i,r;
global unsigned char *src = &src_base[src_offset];
global unsigned char *dst = &dst_base[dst_offset];
src_offset = dst_offset = 0;
r = get_global_id(1);
if (r < get_global_size(1)){
i = get_global_id(0);
if (i < get_global_size(0)){
src_offset = r*src_stride + i;
dst_offset = r*dst_stride + i;
dst[dst_offset] = src[src_offset];
}
}
}
//Not used currently.
void vp8_memset_short(
global short *mem,
int offset,
short newval,
unsigned int size
)
{
int tid = get_global_id(0);
if (tid < (size/2)){
mem[offset+tid/2] = newval;
}
}
__kernel void vp8_bilinear_predict4x4_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 4, Width = 4;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
__kernel void vp8_bilinear_predict8x8_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 8, Width = 8;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
__kernel void vp8_bilinear_predict8x4_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 4, Width = 8;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
__kernel void vp8_bilinear_predict16x16_kernel
(
__global unsigned char *src_base,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_base,
int dst_offset,
int dst_pitch,
__global int *int_mem
)
{
int Height = 16, Width = 16;
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass_kernel(src_base, src_offset, int_mem, src_pixels_per_line, Height + 1, Width, xoffset);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_kernel(int_mem, dst_base, dst_offset, dst_pitch, Height, Width, yoffset);
}
void vp8_filter_block2d_first_pass(
global unsigned char *src_base,
int src_offset,
local int *output_ptr,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
int filter_offset
){
uint tid = get_global_id(0);
uint i = tid;
int nthreads = get_global_size(0);
int ngroups = nthreads / get_local_size(0);
global unsigned char *src_ptr = &src_base[src_offset];
//Note that src_offset will be reset later, which is why we capture it now
int Temp;
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (tid < (output_width*output_height)){
short filter0 = vp8_filter[0];
short filter1 = vp8_filter[1];
short filter2 = vp8_filter[2];
short filter3 = vp8_filter[3];
short filter4 = vp8_filter[4];
short filter5 = vp8_filter[5];
if (ngroups > 1){
//This is generally only true on Apple CPU-CL, which gives a group
//size of 1, regardless of the CPU core count.
for (i=0; i < output_width*output_height; i++){
src_offset = i + (i/output_width * (src_pixels_per_line - output_width));
Temp = (int)(src_ptr[src_offset - 2] * filter0) +
(int)(src_ptr[src_offset - 1] * filter1) +
(int)(src_ptr[src_offset] * filter2) +
(int)(src_ptr[src_offset + 1] * filter3) +
(int)(src_ptr[src_offset + 2] * filter4) +
(int)(src_ptr[src_offset + 3] * filter5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp >>= VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if ( Temp > 255 )
Temp = 255;
output_ptr[i] = Temp;
}
} else {
src_offset = i + (i/output_width * (src_pixels_per_line - output_width));
Temp = (int)(src_ptr[src_offset - 2] * filter0) +
(int)(src_ptr[src_offset - 1] * filter1) +
(int)(src_ptr[src_offset] * filter2) +
(int)(src_ptr[src_offset + 1] * filter3) +
(int)(src_ptr[src_offset + 2] * filter4) +
(int)(src_ptr[src_offset + 3] * filter5) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp >>= VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if ( Temp > 255 )
Temp = 255;
output_ptr[i] = Temp;
}
}
//Add a fence so that no 2nd pass stuff starts before 1st pass writes are done.
barrier(CLK_LOCAL_MEM_FENCE);
}
void vp8_filter_block2d_second_pass
(
local int *src_ptr,
global unsigned char *output_base,
int output_offset,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
int filter_offset
) {
global unsigned char *output_ptr = &output_base[output_offset];
int out_offset; //Not same as output_offset...
int src_offset;
int Temp;
int PS2 = 2*(int)pixel_step;
int PS3 = 3*(int)pixel_step;
unsigned int src_increment = src_pixels_per_line - output_width;
uint i = get_global_id(0);
__constant short *vp8_filter = sub_pel_filters[filter_offset];
if (i < (output_width * output_height)){
out_offset = i/output_width;
src_offset = out_offset;
src_offset = i + (src_offset * src_increment);
out_offset = i%output_width + (out_offset * output_pitch);
/* Apply filter */
Temp = ((int)src_ptr[src_offset - PS2] * vp8_filter[0]) +
((int)src_ptr[src_offset -(int)pixel_step] * vp8_filter[1]) +
((int)src_ptr[src_offset] * vp8_filter[2]) +
((int)src_ptr[src_offset + pixel_step] * vp8_filter[3]) +
((int)src_ptr[src_offset + PS2] * vp8_filter[4]) +
((int)src_ptr[src_offset + PS3] * vp8_filter[5]) +
(VP8_FILTER_WEIGHT >> 1); /* Rounding */
/* Normalize back to 0-255 */
Temp = Temp >> VP8_FILTER_SHIFT;
if (Temp < 0)
Temp = 0;
else if (Temp > 255)
Temp = 255;
output_ptr[out_offset] = (unsigned char)Temp;
}
}
__kernel void vp8_sixtap_predict_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[9*4];
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 9, 4, xoffset);
/* then filter vertically... */
vp8_filter_block2d_second_pass(&FData[8], dst_ptr, dst_offset, dst_pitch, 4, 4, 4, 4, yoffset);
}
__kernel void vp8_sixtap_predict8x8_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[13*16]; /* Temp data bufffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 13, 8, xoffset);
/* then filter vertically... */
vp8_filter_block2d_second_pass(&FData[16], dst_ptr, dst_offset, dst_pitch, 8, 8, 8, 8, yoffset);
}
__kernel void vp8_sixtap_predict8x4_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[13*16]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 9, 8, xoffset);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(&FData[16], dst_ptr, dst_offset, dst_pitch, 8, 8, 4, 8, yoffset);
}
__kernel void vp8_sixtap_predict16x16_kernel
(
__global unsigned char *src_ptr,
int src_offset,
int src_pixels_per_line,
int xoffset,
int yoffset,
__global unsigned char *dst_ptr,
int dst_offset,
int dst_pitch
)
{
local int FData[21*24]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr, src_offset, FData, src_pixels_per_line, 1, 21, 16, xoffset);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(&FData[32], dst_ptr, dst_offset, dst_pitch, 16, 16, 16, 16, yoffset);
return;
}

View File

@@ -1,74 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef FILTER_CL_H_
#define FILTER_CL_H_
#ifdef __cplusplus
extern "C" {
#endif
#include "vp8_opencl.h"
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
#define REGISTER_FILTER 1
#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
#define PRE_CALC_PIXEL_STEPS 1
#define PRE_CALC_SRC_INCREMENT 1
#if PRE_CALC_PIXEL_STEPS
#define PS2 two_pixel_steps
#define PS3 three_pixel_steps
#else
#define PS2 2*(int)pixel_step
#define PS3 3*(int)pixel_step
#endif
#if REGISTER_FILTER
#define FILTER0 filter0
#define FILTER1 filter1
#define FILTER2 filter2
#define FILTER3 filter3
#define FILTER4 filter4
#define FILTER5 filter5
#else
#define FILTER0 vp8_filter[0]
#define FILTER1 vp8_filter[1]
#define FILTER2 vp8_filter[2]
#define FILTER3 vp8_filter[3]
#define FILTER4 vp8_filter[4]
#define FILTER5 vp8_filter[5]
#endif
#if PRE_CALC_SRC_INCREMENT
#define SRC_INCREMENT src_increment
#else
#define SRC_INCREMENT (src_pixels_per_line - output_width)
#endif
#define FILTER_OFFSET //Filter data stored as CL constant memory
#define FILTER_REF sub_pel_filters[filter_offset]
extern const char *filterCompileOptions;
extern const char *filter_cl_file_name;
//Copy the -2*pixel_step (and ps*3) bytes because the filter algorithm
//accesses negative indexes
#define SIXTAP_SRC_LEN(out_width,out_height,src_px) ((out_width)*(out_height) + (((out_width)*(out_height)-1)/(out_width))*(src_px - out_width) + 5)
#define BIL_SRC_LEN(out_width,out_height,src_px) ((out_height) * src_px + out_width)
#define DST_LEN(dst_pitch,dst_height,dst_width) (dst_pitch * (dst_height) + (dst_width))
#ifdef __cplusplus
}
#endif
#endif /* FILTER_CL_H_ */

View File

@@ -1,45 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef IDCT_OPENCL_H
#define IDCT_OPENCL_H
#ifdef __cplusplus
extern "C" {
#endif
#include "vp8_opencl.h"
#include "vp8/common/blockd.h"
#define prototype_second_order_cl(sym) \
void sym(BLOCKD *b)
#define prototype_idct_cl(sym) \
void sym(BLOCKD *b, int pitch)
#define prototype_idct_scalar_add_cl(sym) \
void sym(BLOCKD *b, cl_int use_diff, int diff_offset, int qcoeff_offset, \
int pred_offset, unsigned char *output, cl_mem out_mem, int out_offset, size_t out_size, \
int pitch, int stride)\
extern prototype_idct_cl(vp8_short_idct4x4llm_1_cl);
extern prototype_idct_cl(vp8_short_idct4x4llm_cl);
extern prototype_idct_scalar_add_cl(vp8_dc_only_idct_add_cl);
extern prototype_second_order_cl(vp8_short_inv_walsh4x4_1_cl);
extern prototype_second_order_cl(vp8_short_inv_walsh4x4_cl);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -1,325 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
//ACW: Remove me after debugging.
#include <stdio.h>
#include <string.h>
#include "idct_cl.h"
#include "idctllm_cl.h"
#include "blockd_cl.h"
void cl_destroy_idct(){
if (cl_data.idct_program)
clReleaseProgram(cl_data.idct_program);
cl_data.idct_program = NULL;
VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_1_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_dc_only_idct_add_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_idct4x4llm_1_kernel);
//VP8_CL_RELEASE_KERNEL(cl_data.vp8_short_idct4x4llm_kernel);
}
int cl_init_idct() {
int err;
// Create the filter compute program from the file-defined source code
if (cl_load_program(&cl_data.idct_program, idctllm_cl_file_name,
idctCompileOptions) != CL_SUCCESS)
return VP8_CL_TRIED_BUT_FAILED;
// Create the compute kernel in the program we wish to run
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_1_kernel,"vp8_short_inv_walsh4x4_1_kernel");
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_1st_pass_kernel,"vp8_short_inv_walsh4x4_1st_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_inv_walsh4x4_2nd_pass_kernel,"vp8_short_inv_walsh4x4_2nd_pass_kernel");
VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_dc_only_idct_add_kernel,"vp8_dc_only_idct_add_kernel");
////idct4x4llm kernels are only useful for the encoder
//VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_idct4x4llm_1_kernel,"vp8_short_idct4x4llm_1_kernel");
//VP8_CL_CREATE_KERNEL(cl_data,idct_program,vp8_short_idct4x4llm_kernel,"vp8_short_idct4x4llm_kernel");
return CL_SUCCESS;
}
#define max(x,y) (x > y ? x: y)
//#define NO_CL
/* Only useful for encoder... Untested... */
void vp8_short_idct4x4llm_cl(BLOCKD *b, int pitch)
{
int err;
short *input = b->dqcoeff_base + b->dqcoeff_offset;
short *output = &b->diff_base[b->diff_offset];
cl_mem src_mem, dst_mem;
//1 instance for now. This should be split into 2-pass * 4 thread.
size_t global = 1;
if (cl_initialized != CL_SUCCESS){
vp8_short_idct4x4llm_c(input,output,pitch);
return;
}
VP8_CL_CREATE_BUF(b->cl_commands, src_mem,,
sizeof(short)*16, input,
vp8_short_idct4x4llm_c(input,output,pitch),
);
VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
sizeof(short)*(4+(pitch/2)*3), output,
vp8_short_idct4x4llm_c(input,output,pitch),
);
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 1, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_kernel, 2, sizeof (int), &pitch);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_idct4x4llm_c(input,output,pitch),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_idct4x4llm_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_idct4x4llm_c(input,output,pitch),
);
/* Read back the result data from the device */
err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0, sizeof(short)*(4+pitch/2*3), output, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
vp8_short_idct4x4llm_c(input,output,pitch),
);
clReleaseMemObject(src_mem);
clReleaseMemObject(dst_mem);
return;
}
/* Only useful for encoder... Untested... */
void vp8_short_idct4x4llm_1_cl(BLOCKD *b, int pitch)
{
int err;
size_t global = 4;
short *input = b->dqcoeff_base + b->dqcoeff_offset;
short *output = &b->diff_base[b->diff_offset];
cl_mem src_mem, dst_mem;
if (cl_initialized != CL_SUCCESS){
vp8_short_idct4x4llm_1_c(input,output,pitch);
return;
}
printf("vp8_short_idct4x4llm_1_cl\n");
VP8_CL_CREATE_BUF(b->cl_commands, src_mem,,
sizeof(short), input,
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
sizeof(short)*(4+(pitch/2)*3), output,
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 1, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_short_idct4x4llm_1_kernel, 2, sizeof (int), &pitch);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_idct4x4llm_1_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
/* Read back the result data from the device */
err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0, sizeof(short)*(4+pitch/2*3), output, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read output array!\n",
vp8_short_idct4x4llm_1_c(input,output,pitch),
);
clReleaseMemObject(src_mem);
clReleaseMemObject(dst_mem);
return;
}
void vp8_dc_only_idct_add_cl(BLOCKD *b, cl_int use_diff, int diff_offset,
int qcoeff_offset, int pred_offset,
unsigned char *dst_base, cl_mem dst_mem, int dst_offset, size_t dest_size,
int pitch, int stride
)
{
int err;
size_t global = 16;
int free_mem = 0;
//cl_mem dest_mem = NULL;
if (dst_mem == NULL){
VP8_CL_CREATE_BUF(b->cl_commands, dst_mem,,
dest_size, dst_base,,
);
free_mem = 1;
}
//Set arguments and run kernel
err = clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 0, sizeof (cl_mem), &b->cl_predictor_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 1, sizeof (int), &pred_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 2, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 3, sizeof (int), &dst_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 4, sizeof (int), &pitch);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 5, sizeof (int), &stride);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 6, sizeof (cl_int), &use_diff);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 7, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 8, sizeof (int), &diff_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 9, sizeof (cl_mem), &b->cl_qcoeff_mem);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 10, sizeof (int), &qcoeff_offset);
err |= clSetKernelArg(cl_data.vp8_dc_only_idct_add_kernel, 11, sizeof (cl_mem), &b->cl_dequant_mem);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_dc_only_idct_add_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
if (free_mem == 1){
/* Read back the result data from the device */
err = clEnqueueReadBuffer(b->cl_commands, dst_mem, CL_FALSE, 0,
dest_size, dst_base, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(b->cl_commands, err != CL_SUCCESS,
"Error: Failed to read output array!\n",,
);
clReleaseMemObject(dst_mem);
}
return;
}
void vp8_short_inv_walsh4x4_cl(BLOCKD *b)
{
int err;
size_t global = 4;
if (cl_initialized != CL_SUCCESS){
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset,&b->diff_base[b->diff_offset]);
return;
}
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 0, sizeof (cl_mem), &b->cl_dqcoeff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 1, sizeof(int), &b->dqcoeff_offset);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 2, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 3, sizeof(int), &b->diff_offset);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_1st_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
//Second pass
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 0, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 1, sizeof(int), &b->diff_offset);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_2nd_pass_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_inv_walsh4x4_c(b->dqcoeff_base+b->dqcoeff_offset, &b->diff_base[b->diff_offset]),
);
return;
}
void vp8_short_inv_walsh4x4_1_cl(BLOCKD *b)
{
int err;
size_t global = 4;
if (cl_initialized != CL_SUCCESS){
vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
&b->diff_base[b->diff_offset]);
return;
}
//Set arguments and run kernel
err = 0;
err = clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 0, sizeof (cl_mem), &b->cl_dqcoeff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 1, sizeof (int), &b->dqcoeff_offset);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 2, sizeof (cl_mem), &b->cl_diff_mem);
err |= clSetKernelArg(cl_data.vp8_short_inv_walsh4x4_1_kernel, 3, sizeof (int), &b->diff_offset);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
&b->diff_base[b->diff_offset]),
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(b->cl_commands, cl_data.vp8_short_inv_walsh4x4_1_kernel, 1, NULL, &global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( b->cl_commands, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);
vp8_short_inv_walsh4x4_1_c(b->dqcoeff_base + b->dqcoeff_offset,
&b->diff_base[b->diff_offset]),
);
return;
}

View File

@@ -1,309 +0,0 @@
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
__constant int cospi8sqrt2minus1 = 20091;
__constant int sinpi8sqrt2 = 35468;
__constant int rounding = 0;
kernel void vp8_short_idct4x4llm_1st_pass_kernel(global short*,global short *,int);
kernel void vp8_short_idct4x4llm_2nd_pass_kernel(global short*,int);
__kernel void vp8_short_idct4x4llm_kernel(
__global short *input,
__global short *output,
int pitch
){
vp8_short_idct4x4llm_1st_pass_kernel(input,output,pitch);
vp8_short_idct4x4llm_2nd_pass_kernel(output,pitch);
}
__kernel void vp8_short_idct4x4llm_1st_pass_kernel(
__global short *ip,
__global short *op,
int pitch
)
{
int i;
int a1, b1, c1, d1;
int temp1, temp2;
int shortpitch = pitch >> 1;
for (i = 0; i < 4; i++)
{
a1 = ip[0] + ip[8];
b1 = ip[0] - ip[8];
temp1 = (ip[4] * sinpi8sqrt2 + rounding) >> 16;
temp2 = ip[12] + ((ip[12] * cospi8sqrt2minus1 + rounding) >> 16);
c1 = temp1 - temp2;
temp1 = ip[4] + ((ip[4] * cospi8sqrt2minus1 + rounding) >> 16);
temp2 = (ip[12] * sinpi8sqrt2 + rounding) >> 16;
d1 = temp1 + temp2;
op[shortpitch*0] = a1 + d1;
op[shortpitch*3] = a1 - d1;
op[shortpitch*1] = b1 + c1;
op[shortpitch*2] = b1 - c1;
ip++;
op++;
}
return;
}
__kernel void vp8_short_idct4x4llm_2nd_pass_kernel(
__global short *output,
int pitch
)
{
int i;
int a1, b1, c1, d1;
int temp1, temp2;
int shortpitch = pitch >> 1;
__global short *ip = output;
__global short *op = output;
for (i = 0; i < 4; i++)
{
a1 = ip[0] + ip[2];
b1 = ip[0] - ip[2];
temp1 = (ip[1] * sinpi8sqrt2 + rounding) >> 16;
temp2 = ip[3] + ((ip[3] * cospi8sqrt2minus1 + rounding) >> 16);
c1 = temp1 - temp2;
temp1 = ip[1] + ((ip[1] * cospi8sqrt2minus1 + rounding) >> 16);
temp2 = (ip[3] * sinpi8sqrt2 + rounding) >> 16;
d1 = temp1 + temp2;
op[0] = (a1 + d1 + 4) >> 3;
op[3] = (a1 - d1 + 4) >> 3;
op[1] = (b1 + c1 + 4) >> 3;
op[2] = (b1 - c1 + 4) >> 3;
ip += shortpitch;
op += shortpitch;
}
return;
}
__kernel void vp8_short_idct4x4llm_1_kernel(
__global short *input,
__global short *output,
int pitch
)
{
int a1;
int out_offset;
int shortpitch = pitch >> 1;
//short4 a;
a1 = ((input[0] + 4) >> 3);
//a = a1;
int tid = get_global_id(0);
if (tid < 4){
out_offset = shortpitch * tid;
//vstore4(a,0,&output[out_offset];
output[out_offset] = a1;
output[out_offset+1] = a1;
output[out_offset+2] = a1;
output[out_offset+3] = a1;
}
}
__kernel void vp8_dc_only_idct_add_kernel(
__global unsigned char *pred_base,
int pred_offset,
__global unsigned char *dst_base,
int dst_offset,
int pitch,
int stride,
int use_diff,
global short *diff_base,
int diff_offset,
global short *qcoeff_base,
int qcoeff_offset,
global short *dequant
)
{
int r, c;
//int pred_offset;
global unsigned char *pred_ptr = &pred_base[pred_offset];
global unsigned char *dst_ptr = &dst_base[dst_offset];
int tid = get_global_id(0);
int a1;
if (tid < 16){
if (use_diff == 1){
a1 = diff_base[diff_offset];
} else {
a1 = qcoeff_base[qcoeff_offset] * dequant[0];
}
a1 = (a1 + 4)>>3;
r = tid / 4;
c = tid % 4;
pred_offset = r * pitch;
dst_offset += r * stride;
int a = a1 + pred_ptr[pred_offset + c] ;
if (a < 0)
a = 0;
else if (a > 255)
a = 255;
dst_base[dst_offset + c] = (unsigned char) a ;
}
}
__kernel void vp8_short_inv_walsh4x4_1st_pass_kernel(
__global short *src_base,
int src_offset,
__global short *output_base,
int out_offset
)
{
__global short *input = src_base + src_offset;
__global short *output = output_base + src_offset;
int tid = get_global_id(0);
#define VEC_WALSH 0
#if VEC_WALSH
//4-short vectors to calculate things in
short4 a,b,c,d, a2v, b2v, c2v, d2v, a1t, b1t, c1t, d1t;
short16 out;
if (tid == 0){
//first pass loop in vector form
a = vload4(0,input) + vload4(3,input);
b = vload4(1,input) + vload4(2,input);
c = vload4(1,input) - vload4(2,input);
d = vload4(0,input) - vload4(3,input);
vstore4(a + b, 0, output);
vstore4(c + d, 1, output);
vstore4(a - b, 2, output);
vstore4(d - c, 3, output);
return;
//2nd pass
a = (short4)(output[0], output[4], output[8], output[12]);
b = (short4)(output[1], output[5], output[9], output[13]);
c = (short4)(output[1], output[5], output[9], output[13]);
d = (short4)(output[0], output[4], output[8], output[12]);
a1t = (short4)(output[3], output[7], output[11], output[15]);
b1t = (short4)(output[2], output[6], output[10], output[14]);
c1t = (short4)(output[2], output[6], output[10], output[14]);
d1t = (short4)(output[3], output[7], output[11], output[15]);
a = a + a1t + (short)3;
b = b + b1t;
c = c - c1t;
d = d - d1t + (short)3;
a2v = (a + b) >> (short)3;
b2v = (c + d) >> (short)3;
c2v = (a - b) >> (short)3;
d2v = (d - c) >> (short)3;
out.s048c = a2v;
out.s159d = b2v;
out.s26ae = c2v;
out.s37bf = d2v;
vstore16(out,0,output);
}
#else
int i;
int a1, b1, c1, d1;
int a2, b2, c2, d2;
global short *ip = input;
global short *op = output;
int offset;
if (tid < 4){
offset = tid;
a1 = ip[offset] + ip[offset + 12];
b1 = ip[offset + 4] + ip[offset + 8];
c1 = ip[offset + 4] - ip[offset + 8];
d1 = ip[offset] - ip[offset + 12];
op[offset] = a1 + b1;
op[offset + 4] = c1 + d1;
op[offset + 8] = a1 - b1;
op[offset + 12] = d1 - c1;
}
#endif
}
__kernel void vp8_short_inv_walsh4x4_2nd_pass_kernel(
__global short *output_base,
int out_offset
)
{
int i;
int a1, b1, c1, d1;
int a2, b2, c2, d2;
__global short *output = output_base + out_offset;
int tid = get_global_id(0);
int offset = 0;
if (tid < 4){
offset = 4*tid;
a1 = output[offset] + output[offset + 3];
b1 = output[offset + 1] + output[offset + 2];
c1 = output[offset + 1] - output[offset + 2];
d1 = output[offset + 0] - output[offset + 3];
a2 = a1 + b1;
b2 = c1 + d1;
c2 = a1 - b1;
d2 = d1 - c1;
output[offset + 0] = (a2 + 3) >> 3;
output[offset + 1] = (b2 + 3) >> 3;
output[offset + 2] = (c2 + 3) >> 3;
output[offset + 3] = (d2 + 3) >> 3;
}
}
__kernel void vp8_short_inv_walsh4x4_1_kernel(
__global short *src_data,
int src_offset,
__global short *dst_data,
int dst_offset
){
int a1;
int tid = get_global_id(0);
//short16 a;
int i;
short4 a;
__global short *input = src_data + src_offset;
__global short *output = dst_data + dst_offset;
if (tid < 4)
{
a1 = ((input[0] + 3) >> 3);
a = (short)a1; //Set all elements of vector to a1
vstore4(a, tid, output);
}
}

View File

@@ -1,26 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_config.h"
#include "vp8_opencl.h"
#include "vp8/common/blockd.h"
#define CLAMP(x,min,max) if (x < min) x = min; else if ( x > max ) x = max;
//External functions that are fallbacks if CL is unavailable
extern void vp8_short_idct4x4llm_c(short *input, short *output, int pitch);
extern void vp8_short_idct4x4llm_1_c(short *input, short *output, int pitch);
extern void vp8_dc_only_idct_add_c(short input_dc, unsigned char *pred_ptr, unsigned char *dst_ptr, int pitch, int stride);
extern void vp8_short_inv_walsh4x4_c(short *input, short *output);
extern void vp8_short_inv_walsh4x4_1_c(short *input, short *output);
const char *idctCompileOptions = "-Ivp8/common/opencl";
const char *idctllm_cl_file_name = "vp8/common/opencl/idctllm_cl.cl";

View File

@@ -1,427 +0,0 @@
#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable
#pragma OPENCL EXTENSION cl_amd_printf : enable
typedef unsigned char uc;
typedef signed char sc;
__inline signed char vp8_filter_mask(sc, sc, uc, uc, uc, uc, uc, uc, uc, uc);
__inline signed char vp8_simple_filter_mask(signed char, signed char, uc, uc, uc, uc);
__inline signed char vp8_hevmask(signed char, uc, uc, uc, uc);
__inline signed char vp8_signed_char_clamp(int);
__inline void vp8_mbfilter(signed char mask,signed char hev,global uc *op2,
global uc *op1,global uc *op0,global uc *oq0,global uc *oq1,global uc *oq2);
void vp8_simple_filter(signed char mask,global uc *base, int op1_off,int op0_off,int oq0_off,int oq1_off);
typedef struct
{
signed char lim[16];
signed char flim[16];
signed char thr[16];
signed char mbflim[16];
signed char mbthr[16];
signed char uvlim[16];
signed char uvflim[16];
signed char uvthr[16];
signed char uvmbflim[16];
signed char uvmbthr[16];
} loop_filter_info;
void vp8_filter(
signed char mask,
signed char hev,
global uc *base,
int op1_off,
int op0_off,
int oq0_off,
int oq1_off
)
{
global uc *op1 = &base[op1_off];
global uc *op0 = &base[op0_off];
global uc *oq0 = &base[oq0_off];
global uc *oq1 = &base[oq1_off];
signed char ps0, qs0;
signed char ps1, qs1;
signed char vp8_filter, Filter1, Filter2;
signed char u;
ps1 = (signed char) * op1 ^ 0x80;
ps0 = (signed char) * op0 ^ 0x80;
qs0 = (signed char) * oq0 ^ 0x80;
qs1 = (signed char) * oq1 ^ 0x80;
/* add outer taps if we have high edge variance */
vp8_filter = vp8_signed_char_clamp(ps1 - qs1);
vp8_filter &= hev;
/* inner taps */
vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (qs0 - ps0));
vp8_filter &= mask;
/* save bottom 3 bits so that we round one side +4 and the other +3
* if it equals 4 we'll set to adjust by -1 to account for the fact
* we'd round 3 the other way
*/
Filter1 = vp8_signed_char_clamp(vp8_filter + 4);
Filter2 = vp8_signed_char_clamp(vp8_filter + 3);
Filter1 >>= 3;
Filter2 >>= 3;
u = vp8_signed_char_clamp(qs0 - Filter1);
*oq0 = u ^ 0x80;
u = vp8_signed_char_clamp(ps0 + Filter2);
*op0 = u ^ 0x80;
vp8_filter = Filter1;
/* outer tap adjustments */
vp8_filter += 1;
vp8_filter >>= 1;
vp8_filter &= ~hev;
u = vp8_signed_char_clamp(qs1 - vp8_filter);
*oq1 = u ^ 0x80;
u = vp8_signed_char_clamp(ps1 + vp8_filter);
*op1 = u ^ 0x80;
}
kernel void vp8_loop_filter_horizontal_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p, /* pitch */
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
int hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if (i < get_global_size(0)){
s_off += i;
mask = vp8_filter_mask(limit[i], flimit[i], s_base[s_off - 4*p],
s_base[s_off - 3*p], s_base[s_off - 2*p], s_base[s_off - p],
s_base[s_off], s_base[s_off + p], s_base[s_off + 2*p],
s_base[s_off + 3*p]);
hev = vp8_hevmask(thresh[i], s_base[s_off - 2*p], s_base[s_off - p],
s_base[s_off], s_base[s_off+p]);
vp8_filter(mask, hev, s_base, s_off - 2 * p, s_off - p, s_off,
s_off + p);
}
}
kernel void vp8_loop_filter_vertical_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
int hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if ( i < get_global_size(0) ){
s_off += p * i;
mask = vp8_filter_mask(limit[i], flimit[i],
s_base[s_off-4], s_base[s_off-3], s_base[s_off-2],
s_base[s_off-1], s_base[s_off], s_base[s_off+1],
s_base[s_off+2], s_base[s_off+3]);
hev = vp8_hevmask(thresh[i], s_base[s_off-2], s_base[s_off-1],
s_base[s_off], s_base[s_off+1]);
vp8_filter(mask, hev, s_base, s_off - 2, s_off - 1, s_off, s_off + 1);
}
}
kernel void vp8_mbloop_filter_horizontal_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
global uc *s = s_base+s_off;
signed char hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if (i < get_global_size(0)){
s += i;
mask = vp8_filter_mask(limit[i], flimit[i],
s[-4*p], s[-3*p], s[-2*p], s[-1*p],
s[0*p], s[1*p], s[2*p], s[3*p]);
hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_mbfilter(mask, hev, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p, s + 2 * p);
}
}
kernel void vp8_mbloop_filter_vertical_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
global uc *s = s_base + s_off;
signed char hev = 0; /* high edge variance */
signed char mask = 0;
int i = get_global_id(0);
if (i < get_global_size(0)){
s += p * i;
mask = vp8_filter_mask(limit[i], flimit[i],
s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
vp8_mbfilter(mask, hev, s - 3, s - 2, s - 1, s, s + 1, s + 2);
}
}
kernel void vp8_loop_filter_simple_horizontal_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global const signed char *flimit,
global const signed char *limit,
global const signed char *thresh,
int off_stride
)
{
signed char mask = 0;
int i = get_global_id(0);
(void) thresh;
if (i < get_global_size(0))
{
s_off += i;
mask = vp8_simple_filter_mask(limit[i], flimit[i], s_base[s_off-2*p], s_base[s_off-p], s_base[s_off], s_base[s_off+p]);
vp8_simple_filter(mask, s_base, s_off - 2 * p, s_off - 1 * p, s_off, s_off + 1 * p);
}
}
kernel void vp8_loop_filter_simple_vertical_edge_kernel
(
global unsigned char *s_base,
int s_off,
int p,
global signed char *flimit,
global signed char *limit,
global signed char *thresh,
int off_stride
)
{
signed char mask = 0;
int i = get_global_id(0);
(void) thresh;
if (i < get_global_size(0)){
s_off += p * i;
mask = vp8_simple_filter_mask(limit[i], flimit[i], s_base[s_off-2], s_base[s_off-1], s_base[s_off], s_base[s_off+1]);
vp8_simple_filter(mask, s_base, s_off - 2, s_off - 1, s_off, s_off + 1);
}
}
//Inline and non-kernel functions follow.
__inline void vp8_mbfilter(
signed char mask,
signed char hev,
global uc *op2,
global uc *op1,
global uc *op0,
global uc *oq0,
global uc *oq1,
global uc *oq2
)
{
signed char s, u;
signed char vp8_filter, Filter1, Filter2;
signed char ps2 = (signed char) * op2 ^ 0x80;
signed char ps1 = (signed char) * op1 ^ 0x80;
signed char ps0 = (signed char) * op0 ^ 0x80;
signed char qs0 = (signed char) * oq0 ^ 0x80;
signed char qs1 = (signed char) * oq1 ^ 0x80;
signed char qs2 = (signed char) * oq2 ^ 0x80;
/* add outer taps if we have high edge variance */
vp8_filter = vp8_signed_char_clamp(ps1 - qs1);
vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (qs0 - ps0));
vp8_filter &= mask;
Filter2 = vp8_filter;
Filter2 &= hev;
/* save bottom 3 bits so that we round one side +4 and the other +3 */
Filter1 = vp8_signed_char_clamp(Filter2 + 4);
Filter2 = vp8_signed_char_clamp(Filter2 + 3);
Filter1 >>= 3;
Filter2 >>= 3;
qs0 = vp8_signed_char_clamp(qs0 - Filter1);
ps0 = vp8_signed_char_clamp(ps0 + Filter2);
/* only apply wider filter if not high edge variance */
vp8_filter &= ~hev;
Filter2 = vp8_filter;
/* roughly 3/7th difference across boundary */
u = vp8_signed_char_clamp((63 + Filter2 * 27) >> 7);
s = vp8_signed_char_clamp(qs0 - u);
*oq0 = s ^ 0x80;
s = vp8_signed_char_clamp(ps0 + u);
*op0 = s ^ 0x80;
/* roughly 2/7th difference across boundary */
u = vp8_signed_char_clamp((63 + Filter2 * 18) >> 7);
s = vp8_signed_char_clamp(qs1 - u);
*oq1 = s ^ 0x80;
s = vp8_signed_char_clamp(ps1 + u);
*op1 = s ^ 0x80;
/* roughly 1/7th difference across boundary */
u = vp8_signed_char_clamp((63 + Filter2 * 9) >> 7);
s = vp8_signed_char_clamp(qs2 - u);
*oq2 = s ^ 0x80;
s = vp8_signed_char_clamp(ps2 + u);
*op2 = s ^ 0x80;
}
__inline signed char vp8_signed_char_clamp(int t)
{
t = (t < -128 ? -128 : t);
t = (t > 127 ? 127 : t);
return (signed char) t;
}
/* is there high variance internal edge ( 11111111 yes, 00000000 no) */
__inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0, uc q1)
{
signed char hev = 0;
hev |= (abs(p1 - p0) > thresh) * -1;
hev |= (abs(q1 - q0) > thresh) * -1;
return hev;
}
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
__inline signed char vp8_filter_mask(
signed char limit,
signed char flimit,
uc p3, uc p2, uc p1, uc p0, uc q0, uc q1, uc q2, uc q3)
{
signed char mask = 0;
mask |= (abs(p3 - p2) > limit) * -1;
mask |= (abs(p2 - p1) > limit) * -1;
mask |= (abs(p1 - p0) > limit) * -1;
mask |= (abs(q1 - q0) > limit) * -1;
mask |= (abs(q2 - q1) > limit) * -1;
mask |= (abs(q3 - q2) > limit) * -1;
mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > flimit * 2 + limit) * -1;
mask = ~mask;
return mask;
}
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
__inline signed char vp8_simple_filter_mask(
signed char limit,
signed char flimit,
uc p1,
uc p0,
uc q0,
uc q1
)
{
signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 <= flimit * 2 + limit) * -1;
return mask;
}
void vp8_simple_filter(
signed char mask,
global uc *base,
int op1_off,
int op0_off,
int oq0_off,
int oq1_off
)
{
global uc *op1 = base + op1_off;
global uc *op0 = base + op0_off;
global uc *oq0 = base + oq0_off;
global uc *oq1 = base + oq1_off;
signed char vp8_filter, Filter1, Filter2;
signed char p1 = (signed char) * op1 ^ 0x80;
signed char p0 = (signed char) * op0 ^ 0x80;
signed char q0 = (signed char) * oq0 ^ 0x80;
signed char q1 = (signed char) * oq1 ^ 0x80;
signed char u;
vp8_filter = vp8_signed_char_clamp(p1 - q1);
vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * (q0 - p0));
vp8_filter &= mask;
/* save bottom 3 bits so that we round one side +4 and the other +3 */
Filter1 = vp8_signed_char_clamp(vp8_filter + 4);
Filter1 >>= 3;
u = vp8_signed_char_clamp(q0 - Filter1);
*oq0 = u ^ 0x80;
Filter2 = vp8_signed_char_clamp(vp8_filter + 3);
Filter2 >>= 3;
u = vp8_signed_char_clamp(p0 + Filter2);
*op0 = u ^ 0x80;
}

View File

@@ -1,457 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "../../../vpx_ports/config.h"
#include "loopfilter_cl.h"
#include "../onyxc_int.h"
#include "vpx_config.h"
#include "vp8_opencl.h"
#include "blockd_cl.h"
const char *loopFilterCompileOptions = "-Ivp8/common/opencl";
const char *loop_filter_cl_file_name = "vp8/common/opencl/loopfilter.cl";
typedef unsigned char uc;
extern void vp8_loop_filter_frame
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
);
prototype_loopfilter_cl(vp8_loop_filter_horizontal_edge_cl);
prototype_loopfilter_cl(vp8_loop_filter_vertical_edge_cl);
prototype_loopfilter_cl(vp8_mbloop_filter_horizontal_edge_cl);
prototype_loopfilter_cl(vp8_mbloop_filter_vertical_edge_cl);
prototype_loopfilter_cl(vp8_loop_filter_simple_horizontal_edge_cl);
prototype_loopfilter_cl(vp8_loop_filter_simple_vertical_edge_cl);
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_cl(
MACROBLOCKD *x,
cl_mem buf_base,
int y_off,
int u_off,
int v_off,
int y_stride,
int uv_stride,
loop_filter_info *lfi,
int simpler_lpf
)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, u_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
vp8_mbloop_filter_horizontal_edge_cl(x, buf_base, v_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_mbhs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
vp8_mbloop_filter_vertical_edge_cl(x, buf_base, u_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
vp8_mbloop_filter_vertical_edge_cl(x, buf_base, v_off, uv_stride, lfi->mbflim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_mbvs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off, y_stride, lfi->mbflim, lfi->lim, lfi->thr, 2, 1);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, y_off + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, u_off + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
vp8_loop_filter_horizontal_edge_cl(x, buf_base, v_off + 4 * uv_stride, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_bhs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_horizontal_edge_cl(x, buf_base, y_off + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, y_off + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, u_off + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
vp8_loop_filter_vertical_edge_cl(x, buf_base, v_off + 4, uv_stride, lfi->flim, lfi->lim, lfi->thr, 1, 1);
}
void vp8_loop_filter_bvs_cl(MACROBLOCKD *x, cl_mem buf_base, int y_off, int u_off, int v_off,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
vp8_loop_filter_simple_vertical_edge_cl(x, buf_base, y_off + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2, 1);
}
void vp8_init_loop_filter_cl(VP8_COMMON *cm)
{
loop_filter_info *lfi = cm->lf_info;
int sharpness_lvl = cm->sharpness_level;
int frame_type = cm->frame_type;
int i, j;
int block_inside_limit = 0;
int HEVThresh;
const int yhedge_boost = 2;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
/* Set loop filter paramaeters that control sharpness. */
block_inside_limit = filt_lvl >> (sharpness_lvl > 0);
block_inside_limit = block_inside_limit >> (sharpness_lvl > 4);
if (sharpness_lvl > 0)
{
if (block_inside_limit > (9 - sharpness_lvl))
block_inside_limit = (9 - sharpness_lvl);
}
if (block_inside_limit < 1)
block_inside_limit = 1;
for (j = 0; j < 16; j++)
{
lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl + yhedge_boost;
lfi[i].flim[j] = filt_lvl;
lfi[i].thr[j] = HEVThresh;
}
}
}
/* Put vp8_init_loop_filter() in vp8dx_create_decompressor(). Only call vp8_frame_init_loop_filter() while decoding
* each frame. Check last_frame_type to skip the function most of times.
*/
void vp8_frame_init_loop_filter_cl(loop_filter_info *lfi, int frame_type)
{
int HEVThresh;
int i, j;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
for (j = 0; j < 16; j++)
{
lfi[i].thr[j] = HEVThresh;
}
}
}
//This might not need to be copied from loopfilter.c
void vp8_adjust_mb_lf_value_cl(MACROBLOCKD *mbd, int *filter_level)
{
MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;
if (mbd->mode_ref_lf_delta_enabled)
{
/* Apply delta for reference frame */
*filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
/* Apply delta for mode */
if (mbmi->ref_frame == INTRA_FRAME)
{
/* Only the split mode BPRED has a further special case */
if (mbmi->mode == B_PRED)
*filter_level += mbd->mode_lf_deltas[0];
}
else
{
/* Zero motion mode */
if (mbmi->mode == ZEROMV)
*filter_level += mbd->mode_lf_deltas[1];
/* Split MB motion mode */
else if (mbmi->mode == SPLITMV)
*filter_level += mbd->mode_lf_deltas[3];
/* All other inter motion modes (Nearest, Near, New) */
else
*filter_level += mbd->mode_lf_deltas[2];
}
/* Range check */
if (*filter_level > MAX_LOOP_FILTER)
*filter_level = MAX_LOOP_FILTER;
else if (*filter_level < 0)
*filter_level = 0;
}
}
//Start of externally callable functions.
int cl_init_loop_filter() {
int err;
// Create the filter compute program from the file-defined source code
if ( cl_load_program(&cl_data.loop_filter_program, loop_filter_cl_file_name,
loopFilterCompileOptions) != CL_SUCCESS )
return VP8_CL_TRIED_BUT_FAILED;
// Create the compute kernels in the program we wish to run
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_horizontal_edge_kernel,"vp8_loop_filter_horizontal_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_vertical_edge_kernel,"vp8_loop_filter_vertical_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_mbloop_filter_horizontal_edge_kernel,"vp8_mbloop_filter_horizontal_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_mbloop_filter_vertical_edge_kernel,"vp8_mbloop_filter_vertical_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_simple_horizontal_edge_kernel,"vp8_loop_filter_simple_horizontal_edge_kernel");
VP8_CL_CREATE_KERNEL(cl_data,loop_filter_program,vp8_loop_filter_simple_vertical_edge_kernel,"vp8_loop_filter_simple_vertical_edge_kernel");
return CL_SUCCESS;
}
void cl_destroy_loop_filter(){
if (cl_data.loop_filter_program)
clReleaseProgram(cl_data.loop_filter_program);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_horizontal_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_vertical_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_mbloop_filter_horizontal_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_mbloop_filter_vertical_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_simple_horizontal_edge_kernel);
VP8_CL_RELEASE_KERNEL(cl_data.vp8_loop_filter_simple_vertical_edge_kernel);
cl_data.loop_filter_program = NULL;
}
void vp8_loop_filter_set_baselines_cl(MACROBLOCKD *mbd, int default_filt_lvl, int *baseline_filter_level){
int alt_flt_enabled = mbd->segmentation_enabled;
int i;
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
}
void vp8_loop_filter_frame_cl
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
loop_filter_info *lfi = cm->lf_info;
FRAME_TYPE frame_type = cm->frame_type;
LOOPFILTERTYPE filter_type = cm->filter_type;
int mb_row;
int mb_col;
int baseline_filter_level[MAX_MB_SEGMENTS];
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
int err;
unsigned char *buf_base;
int y_off, u_off, v_off;
//unsigned char *y_ptr, *u_ptr, *v_ptr;
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
vp8_loop_filter_set_baselines_cl(mbd, default_filt_lvl, baseline_filter_level);
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter_cl(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter_cl(lfi, frame_type);
/* Set up the buffer pointers */
buf_base = post->buffer_alloc;
y_off = post->y_buffer - buf_base;
u_off = post->u_buffer - buf_base;
v_off = post->v_buffer - buf_base;
VP8_CL_SET_BUF(mbd->cl_commands, post->buffer_mem, post->buffer_size, post->buffer_alloc,
vp8_loop_filter_frame(cm,mbd,default_filt_lvl),);
/* vp8_filter each macro block */
for (mb_row = 0; mb_row < cm->mb_rows; mb_row++)
{
for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
filter_level = baseline_filter_level[Segment];
/* Distance of Mb to the various image edges.
* These specified to 8th pel as they are always compared to values
* that are in 1/8th pel units. Apply any context driven MB level
* adjustment
*/
filter_level = vp8_adjust_mb_lf_value(mbd, filter_level);
if (filter_level)
{
if (mb_col > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_mbv_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_mbvs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
if (mbd->mode_info_context->mbmi.dc_diff > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_bv_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_bvs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
/* don't apply across umv border */
if (mb_row > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_mbh_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_mbhs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
if (mbd->mode_info_context->mbmi.dc_diff > 0){
if (filter_type == NORMAL_LOOPFILTER)
vp8_loop_filter_bh_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
else
vp8_loop_filter_bhs_cl(mbd, post->buffer_mem, y_off, u_off, v_off, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
}
}
y_off += 16;
u_off += 8;
v_off += 8;
mbd->mode_info_context++; /* step to next MB */
}
y_off += post->y_stride * 16 - post->y_width;
u_off += post->uv_stride * 8 - post->uv_width;
v_off += post->uv_stride * 8 - post->uv_width;
mbd->mode_info_context++; /* Skip border mb */
}
//Retrieve buffer contents
err = clEnqueueReadBuffer(mbd->cl_commands, post->buffer_mem, CL_FALSE, 0, post->buffer_size, post->buffer_alloc, 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS(mbd->cl_commands, err != CL_SUCCESS,
"Error: Failed to read loop filter output!\n",
,
);
VP8_CL_FINISH(mbd->cl_commands);
}

View File

@@ -1,48 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef loopfilter_cl_h
#define loopfilter_cl_h
#include "../../../vpx_ports/mem.h"
#include "../onyxc_int.h"
#include "blockd_cl.h"
#include "../loopfilter.h"
#define prototype_loopfilter_cl(sym) \
void sym(MACROBLOCKD*, cl_mem src_base, int src_offset, \
int pitch, const signed char *flimit, \
const signed char *limit, const signed char *thresh, int count, int block_cnt)
#define prototype_loopfilter_block_cl(sym) \
void sym(MACROBLOCKD*, unsigned char *y, unsigned char *u, unsigned char *v,\
int ystride, int uv_stride, loop_filter_info *lfi, int simpler)
extern void vp8_loop_filter_frame_cl
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
);
extern prototype_loopfilter_block_cl(vp8_lf_normal_mb_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_normal_b_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_normal_mb_h_cl);
extern prototype_loopfilter_block_cl(vp8_lf_normal_b_h_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_mb_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_b_v_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_mb_h_cl);
extern prototype_loopfilter_block_cl(vp8_lf_simple_b_h_cl);
typedef prototype_loopfilter_block_cl((*vp8_lf_block_cl_fn_t));
#endif

View File

@@ -1,187 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdlib.h>
#include <stdio.h>
#include "vpx_ports/config.h"
#include "vp8_opencl.h"
#include "blockd_cl.h"
//#include "loopfilter_cl.h"
//#include "../onyxc_int.h"
typedef unsigned char uc;
static void vp8_loop_filter_cl_run(
cl_command_queue cq,
cl_kernel kernel,
cl_mem buf_mem,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
){
size_t global[] = {count,block_cnt};
int err;
cl_mem flimit_mem;
cl_mem limit_mem;
cl_mem thresh_mem;
VP8_CL_CREATE_BUF(cq, flimit_mem, , sizeof(uc)*16, flimit,, );
VP8_CL_CREATE_BUF(cq, limit_mem, , sizeof(uc)*16, limit,, );
VP8_CL_CREATE_BUF(cq, thresh_mem, , sizeof(uc)*16, thresh,, );
err = 0;
err = clSetKernelArg(kernel, 0, sizeof (cl_mem), &buf_mem);
err |= clSetKernelArg(kernel, 1, sizeof (cl_int), &s_off);
err |= clSetKernelArg(kernel, 2, sizeof (cl_int), &p);
err |= clSetKernelArg(kernel, 3, sizeof (cl_mem), &flimit_mem);
err |= clSetKernelArg(kernel, 4, sizeof (cl_mem), &limit_mem);
err |= clSetKernelArg(kernel, 5, sizeof (cl_mem), &thresh_mem);
err |= clSetKernelArg(kernel, 6, sizeof (cl_int), &block_cnt);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",,
);
/* Execute the kernel */
err = clEnqueueNDRangeKernel(cq, kernel, 2, NULL, global, NULL , 0, NULL, NULL);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
printf("err = %d\n",err);,
);
clReleaseMemObject(flimit_mem);
clReleaseMemObject(limit_mem);
clReleaseMemObject(thresh_mem);
VP8_CL_FINISH(cq);
}
void vp8_loop_filter_horizontal_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p, /* pitch */
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_horizontal_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_loop_filter_vertical_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_vertical_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_mbloop_filter_horizontal_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_mbloop_filter_horizontal_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_mbloop_filter_vertical_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_mbloop_filter_vertical_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_loop_filter_simple_horizontal_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_simple_horizontal_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}
void vp8_loop_filter_simple_vertical_edge_cl
(
MACROBLOCKD *x,
cl_mem s_base,
int s_off,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count,
int block_cnt
)
{
vp8_loop_filter_cl_run(x->cl_commands,
cl_data.vp8_loop_filter_simple_vertical_edge_kernel, s_base, s_off,
p, flimit, limit, thresh, count*8, block_cnt
);
}

View File

@@ -1,41 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_ports/config.h"
#include "../subpixel.h"
#include "subpixel_cl.h"
#include "../onyxc_int.h"
#include "vp8_opencl.h"
#if HAVE_DLOPEN
#include "dynamic_cl.h"
#endif
void vp8_arch_opencl_common_init(VP8_COMMON *ctx)
{
#if HAVE_DLOPEN
#if WIN32 //Windows .dll has no lib prefix and no extension
cl_loaded = load_cl("OpenCL");
#else //But *nix needs full name
cl_loaded = load_cl("libOpenCL.so");
#endif
if (cl_loaded == CL_SUCCESS)
cl_initialized = cl_common_init();
else
cl_initialized = VP8_CL_TRIED_BUT_FAILED;
#else //!HAVE_DLOPEN (e.g. Apple)
cl_initialized = cl_common_init();
#endif
}

View File

@@ -1,641 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
//for the decoder, all subpixel prediction is done in this file.
//
//Need to determine some sort of mechanism for easily determining SIXTAP/BILINEAR
//and what arguments to feed into the kernels. These kernels SHOULD be 2-pass,
//and ideally there'd be a data structure that determined what static arguments
//to pass in.
//
//Also, the only external functions being called here are the subpixel prediction
//functions. Hopefully this means no worrying about when to copy data back/forth.
#include "../../../vpx_ports/config.h"
//#include "../recon.h"
#include "../subpixel.h"
//#include "../blockd.h"
//#include "../reconinter.h"
#if CONFIG_RUNTIME_CPU_DETECT
//#include "../onyxc_int.h"
#endif
#include "vp8_opencl.h"
#include "filter_cl.h"
#include "reconinter_cl.h"
#include "blockd_cl.h"
#include <stdio.h>
/* use this define on systems where unaligned int reads and writes are
* not allowed, i.e. ARM architectures
*/
/*#define MUST_BE_ALIGNED*/
static const int bbb[4] = {0, 2, 8, 10};
static void vp8_memcpy(
unsigned char *src_base,
int src_offset,
int src_stride,
unsigned char *dst_base,
int dst_offset,
int dst_stride,
int num_bytes,
int num_iter
){
int i,r;
unsigned char *src = &src_base[src_offset];
unsigned char *dst = &dst_base[dst_offset];
src_offset = dst_offset = 0;
for (r = 0; r < num_iter; r++){
for (i = 0; i < num_bytes; i++){
src_offset = r*src_stride + i;
dst_offset = r*dst_stride + i;
dst[dst_offset] = src[src_offset];
}
}
}
static void vp8_copy_mem_cl(
cl_command_queue cq,
cl_mem src_mem,
int *src_offsets,
int src_stride,
cl_mem dst_mem,
int *dst_offsets,
int dst_stride,
int num_bytes,
int num_iter,
int num_blocks
){
int err,block;
#if MEM_COPY_KERNEL
size_t global[3] = {num_bytes, num_iter, num_blocks};
size_t local[3];
local[0] = global[0];
local[1] = global[1];
local[2] = global[2];
err = clSetKernelArg(cl_data.vp8_memcpy_kernel, 0, sizeof (cl_mem), &src_mem);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 2, sizeof (int), &src_stride);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 3, sizeof (cl_mem), &dst_mem);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 5, sizeof (int), &dst_stride);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 6, sizeof (int), &num_bytes);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 7, sizeof (int), &num_iter);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
return,
);
for (block = 0; block < num_blocks; block++){
/* Set kernel arguments */
err = clSetKernelArg(cl_data.vp8_memcpy_kernel, 1, sizeof (int), &src_offsets[block]);
err |= clSetKernelArg(cl_data.vp8_memcpy_kernel, 4, sizeof (int), &dst_offsets[block]);
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to set kernel arguments!\n",
return,
);
/* Execute the kernel */
if (num_bytes * num_iter > cl_data.vp8_memcpy_kernel_size){
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_memcpy_kernel, 2, NULL, global, NULL , 0, NULL, NULL);
} else {
err = clEnqueueNDRangeKernel( cq, cl_data.vp8_memcpy_kernel, 2, NULL, global, local , 0, NULL, NULL);
}
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS,
"Error: Failed to execute kernel!\n",
return,
);
}
#else
int iter;
for (block=0; block < num_blocks; block++){
for (iter = 0; iter < num_iter; iter++){
err = clEnqueueCopyBuffer(cq, src_mem, dst_mem,
src_offsets[block]+iter*src_stride,
dst_offsets[block]+iter*dst_stride,
num_bytes, 0, NULL, NULL
);
VP8_CL_CHECK_SUCCESS(cq, err != CL_SUCCESS, "Error copying between buffers\n",
,
);
}
}
#endif
}
static void vp8_build_inter_predictors_b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
vp8_subpix_cl_fn_t sppf;
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int pre_off = pre_dist+ptr_offset;
if (d->sixtap_filter == CL_TRUE)
sppf = vp8_sixtap_predict4x4_cl;
else
sppf = vp8_bilinear_predict4x4_cl;
//ptr_base a.k.a. d->base_pre is the start of the
//Macroblock's y_buffer, u_buffer, or v_buffer
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
sppf(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
}
else
{
vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride,d->cl_predictor_mem, &d->predictor_offset,pitch,4,4,1);
}
}
static void vp8_build_inter_predictors4b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int pre_off = pre_dist + ptr_offset;
//If there's motion in the bottom 8 subpixels, need to do subpixel prediction
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
if (d->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
else
vp8_bilinear_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
}
//Otherwise copy memory directly from src to dest
else
{
vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride, d->cl_predictor_mem, &d->predictor_offset, pitch, 8, 8, 1);
}
}
static void vp8_build_inter_predictors2b_cl(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int pre_off = pre_dist+ptr_offset;
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
if (d->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x4_cl(d->cl_commands,ptr_base,pre_mem,pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
else
vp8_bilinear_predict8x4_cl(d->cl_commands,ptr_base,pre_mem,pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, d->predictor_base, d->cl_predictor_mem, d->predictor_offset, pitch);
}
else
{
vp8_copy_mem_cl(d->cl_commands, pre_mem, &pre_off, d->pre_stride, d->cl_predictor_mem, &d->predictor_offset, pitch, 8, 4, 1);
}
}
void vp8_build_inter_predictors_mbuv_cl(MACROBLOCKD *x)
{
int i;
vp8_cl_mb_prep(x, PREDICTOR|PRE_BUF);
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->cl_commands);
#endif
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
{
unsigned char *pred_base = x->predictor;
int upred_offset = 256;
int vpred_offset = 320;
int mv_row = x->block[16].bmi.mv.as_mv.row;
int mv_col = x->block[16].bmi.mv.as_mv.col;
int offset;
unsigned char *pre_base = x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int upre_off = x->pre.u_buffer - pre_base;
int vpre_off = x->pre.v_buffer - pre_base;
int pre_stride = x->block[16].pre_stride;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (cl_initialized == CL_SUCCESS && x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict8x8_cl(x->block[16].cl_commands,pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_sixtap_predict8x8_cl(x->block[20].cl_commands,pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
else{
vp8_bilinear_predict8x8_cl(x->block[16].cl_commands,pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_bilinear_predict8x8_cl(x->block[20].cl_commands,pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
}
else
{
int pre_offsets[2] = {upre_off+offset, vpre_off+offset};
int pred_offsets[2] = {upred_offset,vpred_offset};
vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, pre_offsets, pre_stride, x->cl_predictor_mem, pred_offsets, 8, 8, 8, 2);
}
}
else
{
// Can probably batch these operations as well, but not tested in decoder
// (or at least the test videos I've been using.
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
vp8_build_inter_predictors2b_cl(x, d0, 8);
else
{
vp8_build_inter_predictors_b_cl(x, d0, 8);
vp8_build_inter_predictors_b_cl(x, d1, 8);
}
}
}
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
#endif
vp8_cl_mb_finish(x, PREDICTOR);
}
void vp8_build_inter_predictors_mb_cl(MACROBLOCKD *x)
{
//If CL is running in encoder, need to call following before proceeding.
//vp8_cl_mb_prep(x, PRE_BUF);
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->cl_commands);
#endif
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
{
int offset;
unsigned char *pred_base = x->predictor;
int upred_offset = 256;
int vpred_offset = 320;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->block[0].pre_stride;
unsigned char *pre_base = x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
int ypre_off = x->pre.y_buffer - pre_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
int upre_off = x->pre.u_buffer - pre_base;
int vpre_off = x->pre.v_buffer - pre_base;
if ((mv_row | mv_col) & 7)
{
if (cl_initialized == CL_SUCCESS && x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, 0, 16);
}
else
vp8_bilinear_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, 0, 16);
}
else
{
//16x16 copy
int pred_off = 0;
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &ypre_off, pre_stride, x->cl_predictor_mem, &pred_off, 16, 16, 16, 1);
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_sixtap_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
else {
vp8_bilinear_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, upred_offset, 8);
vp8_bilinear_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, pred_base, x->cl_predictor_mem, vpred_offset, 8);
}
}
else
{
int pre_off = upre_off + offset;
vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, &pre_off, pre_stride, x->cl_predictor_mem, &upred_offset, 8, 8, 8, 1);
pre_off = vpre_off + offset;
vp8_copy_mem_cl(x->block[20].cl_commands, pre_mem, &pre_off, pre_stride, x->cl_predictor_mem, &vpred_offset, 8, 8, 8, 1);
}
}
else
{
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
BLOCKD *d = &x->block[bbb[i]];
vp8_build_inter_predictors4b_cl(x, d, 16);
}
}
else
{
/* This loop can be done in any order... No dependencies.*/
/* Also, d0/d1 can be decoded simultaneously */
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
vp8_build_inter_predictors2b_cl(x, d0, 16);
else
{
vp8_build_inter_predictors_b_cl(x, d0, 16);
vp8_build_inter_predictors_b_cl(x, d1, 16);
}
}
}
/* Another case of re-orderable/batchable loop */
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
vp8_build_inter_predictors2b_cl(x, d0, 8);
else
{
vp8_build_inter_predictors_b_cl(x, d0, 8);
vp8_build_inter_predictors_b_cl(x, d1, 8);
}
}
}
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
#endif
vp8_cl_mb_finish(x, PREDICTOR);
}
/* The following functions are written for skip_recon_mb() to call. Since there is no recon in this
* situation, we can write the result directly to dst buffer instead of writing it to predictor
* buffer and then copying it to dst buffer.
*/
static void vp8_build_inter_predictors_b_s_cl(MACROBLOCKD *x, BLOCKD *d, int dst_offset)
{
unsigned char *ptr_base = *(d->base_pre);
int dst_stride = d->dst_stride;
int pre_stride = d->pre_stride;
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
vp8_subpix_cl_fn_t sppf;
int pre_dist = *d->base_pre - x->pre.buffer_alloc;
cl_mem pre_mem = x->pre.buffer_mem;
cl_mem dst_mem = x->dst.buffer_mem;
if (d->sixtap_filter == CL_TRUE){
sppf = vp8_sixtap_predict4x4_cl;
} else
sppf = vp8_bilinear_predict4x4_cl;
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
sppf(d->cl_commands, ptr_base, pre_mem, pre_dist+ptr_offset, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, NULL, dst_mem, dst_offset, dst_stride);
}
else
{
int pre_off = pre_dist+ptr_offset;
vp8_copy_mem_cl(d->cl_commands, pre_mem,&pre_off,pre_stride, dst_mem, &dst_offset,dst_stride,4,4,1);
}
}
void vp8_build_inter_predictors_mb_s_cl(MACROBLOCKD *x)
{
cl_mem dst_mem = NULL;
cl_mem pre_mem = x->pre.buffer_mem;
unsigned char *dst_base = x->dst.buffer_alloc;
int ydst_off = x->dst.y_buffer - dst_base;
int udst_off = x->dst.u_buffer - dst_base;
int vdst_off = x->dst.v_buffer - dst_base;
dst_mem = x->dst.buffer_mem;
vp8_cl_mb_prep(x, DST_BUF);
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->cl_commands);
#endif
if (x->mode_info_context->mbmi.mode != SPLITMV)
{
int offset;
unsigned char *pre_base = x->pre.buffer_alloc;
int ypre_off = x->pre.y_buffer - pre_base;
int upre_off = x->pre.u_buffer - pre_base;
int vpre_off = x->pre.v_buffer - pre_base;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->dst.y_stride;
int ptr_offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off+ptr_offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
vp8_bilinear_predict16x16_cl(x->block[0].cl_commands, pre_base, pre_mem, ypre_off+ptr_offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
{
int pre_off = ypre_off+ptr_offset;
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 16, 16, 1);
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
if (x->sixtap_filter == CL_TRUE){
vp8_sixtap_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, udst_off, x->dst.uv_stride);
vp8_sixtap_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, vdst_off, x->dst.uv_stride);
} else {
vp8_bilinear_predict8x8_cl(x->block[16].cl_commands, pre_base, pre_mem, upre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, udst_off, x->dst.uv_stride);
vp8_bilinear_predict8x8_cl(x->block[20].cl_commands, pre_base, pre_mem, vpre_off+offset, pre_stride, mv_col & 7, mv_row & 7, dst_base, dst_mem, vdst_off, x->dst.uv_stride);
}
}
else
{
int pre_offsets[2] = {upre_off+offset, vpre_off+offset};
int dst_offsets[2] = {udst_off,vdst_off};
vp8_copy_mem_cl(x->block[16].cl_commands, pre_mem, pre_offsets, pre_stride, dst_mem, dst_offsets, x->dst.uv_stride, 8, 8, 2);
}
}
else
{
/* note: this whole ELSE part is not executed at all. So, no way to test the correctness of my modification. Later,
* if sth is wrong, go back to what it is in build_inter_predictors_mb.
*
* ACW: Not sure who the above comment belongs to, but it is
* accurate for the decoder. Verified by reverse trace of source
*/
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
BLOCKD *d = &x->block[bbb[i]];
{
unsigned char *ptr_base = *(d->base_pre);
int pre_off = ptr_base - x->pre.buffer_alloc;
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
pre_off += ptr_offset;
if ( (d->bmi.mv.as_mv.row | d->bmi.mv.as_mv.col) & 7)
{
if (x->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
else
vp8_bilinear_predict8x8_cl(d->cl_commands, ptr_base, pre_mem, pre_off, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
{
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, d->pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 8, 8, 1);
}
}
}
}
else
{
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
{
/*vp8_build_inter_predictors2b(x, d0, 16);*/
unsigned char *ptr_base = *(d0->base_pre);
int pre_off = ptr_base - x->pre.buffer_alloc;
int ptr_offset = d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
pre_off += ptr_offset;
if ( (d0->bmi.mv.as_mv.row | d0->bmi.mv.as_mv.col) & 7)
{
if (d0->sixtap_filter == CL_TRUE)
vp8_sixtap_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
else
vp8_bilinear_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem,pre_off, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_base, dst_mem, ydst_off, x->dst.y_stride);
}
else
{
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off, d0->pre_stride, dst_mem, &ydst_off, x->dst.y_stride, 8, 4, 1);
}
}
else
{
vp8_build_inter_predictors_b_s_cl(x,d0, ydst_off);
vp8_build_inter_predictors_b_s_cl(x,d1, ydst_off);
}
}
}
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
{
/*vp8_build_inter_predictors2b(x, d0, 8);*/
unsigned char *ptr_base = *(d0->base_pre);
int ptr_offset = d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
int pre_off = ptr_base - x->pre.buffer_alloc + ptr_offset;
if ( (d0->bmi.mv.as_mv.row | d0->bmi.mv.as_mv.col) & 7)
{
if (d0->sixtap_filter || CL_TRUE)
vp8_sixtap_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride,
d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7,
dst_base, dst_mem, ydst_off, x->dst.uv_stride);
else
vp8_bilinear_predict8x4_cl(d0->cl_commands, ptr_base, pre_mem, pre_off, d0->pre_stride,
d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7,
dst_base, dst_mem, ydst_off, x->dst.uv_stride);
}
else
{
vp8_copy_mem_cl(x->block[0].cl_commands, pre_mem, &pre_off,
d0->pre_stride, dst_mem, &ydst_off, x->dst.uv_stride, 8, 4, 1);
}
}
else
{
vp8_build_inter_predictors_b_s_cl(x,d0, ydst_off);
vp8_build_inter_predictors_b_s_cl(x,d1, ydst_off);
}
} //end for
}
#if !ONE_CQ_PER_MB
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
#endif
vp8_cl_mb_finish(x, DST_BUF);
}

View File

@@ -1,25 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef __INC_RECONINTER_CL_H
#define __INC_RECONINTER_CL_H
#include "blockd_cl.h"
#include "subpixel_cl.h"
#include "filter_cl.h"
extern void vp8_build_inter_predictors_mb_cl(MACROBLOCKD *x);
extern void vp8_build_inter_predictors_mbuv_cl(MACROBLOCKD *x);
extern void vp8_build_inter_predictors_mb_s_cl(MACROBLOCKD *x);
//extern void vp8_build_inter_predictors_b_cl(BLOCKD *d, int pitch);
#endif

View File

@@ -1,46 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef SUBPIXEL_CL_H
#define SUBPIXEL_CL_H
#include "../blockd.h"
/* Note:
*
* This platform is commonly built for runtime CPU detection. If you modify
* any of the function mappings present in this file, be sure to also update
* them in the function pointer initialization code
*/
#define prototype_subpixel_predict_cl(sym) \
void sym(cl_command_queue cq, unsigned char *src_base, cl_mem src_mem, int src_offset, \
int src_pitch, int xofst, int yofst, \
unsigned char *dst_base, cl_mem dst_mem, int dst_offset, int dst_pitch)
extern prototype_subpixel_predict_cl(vp8_sixtap_predict16x16_cl);
extern prototype_subpixel_predict_cl(vp8_sixtap_predict8x8_cl);
extern prototype_subpixel_predict_cl(vp8_sixtap_predict8x4_cl);
extern prototype_subpixel_predict_cl(vp8_sixtap_predict4x4_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict16x16_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict8x8_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict8x4_cl);
extern prototype_subpixel_predict_cl(vp8_bilinear_predict4x4_cl);
typedef prototype_subpixel_predict_cl((*vp8_subpix_cl_fn_t));
//typedef enum
//{
// SIXTAP = 0,
// BILINEAR = 1
//} SUBPIX_TYPE;
#endif

View File

@@ -1,342 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "vp8_opencl.h"
int cl_initialized = VP8_CL_NOT_INITIALIZED;
VP8_COMMON_CL cl_data;
//Initialization functions for various CL programs.
extern int cl_init_filter();
extern int cl_init_idct();
extern int cl_init_loop_filter();
//Common CL destructors
extern void cl_destroy_loop_filter();
extern void cl_destroy_filter();
extern void cl_destroy_idct();
//Destructors for encoder/decoder-specific bits
extern void cl_decode_destroy();
extern void cl_encode_destroy();
/**
*
* @param cq
* @param new_status
*/
void cl_destroy(cl_command_queue cq, int new_status) {
if (cl_initialized != CL_SUCCESS)
return;
//Wait on any pending operations to complete... frees up all of our pointers
if (cq != NULL)
clFinish(cq);
#if ENABLE_CL_SUBPIXEL
//Release the objects that we've allocated on the GPU
cl_destroy_filter();
#endif
#if ENABLE_CL_IDCT_DEQUANT
cl_destroy_idct();
#if CONFIG_VP8_DECODER
if (cl_data.cl_decode_initialized == CL_SUCCESS)
cl_decode_destroy();
#endif
#endif
#if ENABLE_CL_LOOPFILTER
cl_destroy_loop_filter();
#endif
#if CONFIG_VP8_ENCODER
//placeholder for if/when encoder CL gets implemented
#endif
if (cq){
clReleaseCommandQueue(cq);
}
if (cl_data.context){
clReleaseContext(cl_data.context);
cl_data.context = NULL;
}
cl_initialized = new_status;
return;
}
/**
*
* @param dev
* @return
*/
cl_device_type device_type(cl_device_id dev){
cl_device_type type;
int err;
err = clGetDeviceInfo(dev, CL_DEVICE_TYPE, sizeof(type),&type,NULL);
if (err != CL_SUCCESS)
return CL_INVALID_DEVICE;
return type;
}
/**
*
* @return
*/
int cl_common_init() {
int err,i,dev;
cl_platform_id platform_ids[MAX_NUM_PLATFORMS];
cl_uint num_found, num_devices;
cl_device_id devices[MAX_NUM_DEVICES];
//Don't allow multiple CL contexts..
if (cl_initialized != VP8_CL_NOT_INITIALIZED)
return cl_initialized;
// Connect to a compute device
err = clGetPlatformIDs(MAX_NUM_PLATFORMS, platform_ids, &num_found);
if (err != CL_SUCCESS) {
fprintf(stderr, "Couldn't query platform IDs\n");
return VP8_CL_TRIED_BUT_FAILED;
}
if (num_found == 0) {
fprintf(stderr, "No platforms found\n");
return VP8_CL_TRIED_BUT_FAILED;
}
//printf("Enumerating %d platform(s)\n", num_found);
//Enumerate the platforms found
for (i = 0; i < num_found; i++){
char buf[2048];
size_t len;
err = clGetPlatformInfo( platform_ids[i], CL_PLATFORM_VENDOR, sizeof(buf), buf, &len);
if (err != CL_SUCCESS){
fprintf(stderr, "Error retrieving platform vendor for platform %d",i);
continue;
}
//printf("Platform %d: %s\n",i,buf);
//If you need to force a platform (e.g. CPU-only testing), uncomment this
//if (strstr(buf,"NVIDIA"))
// continue;
//Try to find a valid compute device
//Favor the GPU, but fall back to any other available device if necessary
#ifdef __APPLE__
printf("Apple system. Running CL as CPU-only for now...\n");
err = clGetDeviceIDs(platform_ids[i], CL_DEVICE_TYPE_CPU, MAX_NUM_DEVICES, devices, &num_devices);
#else
err = clGetDeviceIDs(platform_ids[i], CL_DEVICE_TYPE_ALL, MAX_NUM_DEVICES, devices, &num_devices);
#endif //__APPLE__
//printf("found %d devices\n", num_devices);
cl_data.device_id = NULL;
for( dev = 0; dev < num_devices; dev++ ){
char ext[2048];
//Get info for this device.
err = clGetDeviceInfo(devices[dev], CL_DEVICE_EXTENSIONS,
sizeof(ext),ext,NULL);
VP8_CL_CHECK_SUCCESS(NULL,err != CL_SUCCESS,
"Error retrieving device extension list",continue, 0);
//printf("Device %d supports: %s\n",dev,ext);
//The kernels in VP8 require byte-addressable stores, which is an
//extension. It's required in OpenCL 1.1, but not all devices
//support it.
if (strstr(ext,"cl_khr_byte_addressable_store")){
//We found a valid device, so use it. But if we find a GPU
//(maybe this is one), prefer that.
cl_data.device_id = devices[dev];
if ( device_type(devices[dev]) == CL_DEVICE_TYPE_GPU ){
//printf("Device %d is a GPU\n",dev);
break;
}
}
}
//If we've found a usable GPU, stop looking.
if (cl_data.device_id != NULL && device_type(cl_data.device_id) == CL_DEVICE_TYPE_GPU )
break;
}
if (cl_data.device_id == NULL){
printf("Error: Failed to find a valid OpenCL device. Using CPU paths\n");
return VP8_CL_TRIED_BUT_FAILED;
}
// Create the compute context
cl_data.context = clCreateContext(0, 1, &cl_data.device_id, NULL, NULL, &err);
if (!cl_data.context) {
printf("Error: Failed to create a compute context!\n");
return VP8_CL_TRIED_BUT_FAILED;
}
//Initialize programs to null value
//Enables detection of if they've been initialized as well.
cl_data.filter_program = NULL;
cl_data.idct_program = NULL;
cl_data.loop_filter_program = NULL;
#if ENABLE_CL_SUBPIXEL
err = cl_init_filter();
if (err != CL_SUCCESS)
return err;
#endif
#if ENABLE_CL_IDCT_DEQUANT
err = cl_init_idct();
if (err != CL_SUCCESS)
return err;
#endif
#if ENABLE_CL_LOOPFILTER
err = cl_init_loop_filter();
if (err != CL_SUCCESS)
return err;
#endif
return CL_SUCCESS;
}
char *cl_read_file(const char* file_name) {
long pos;
char *bytes;
size_t amt_read;
FILE *f;
f = fopen(file_name, "rb");
if (f == NULL) {
char *fullpath;
//printf("Couldn't find %s\n", file_name);
//Generate a file path for the CL sources using the library install dir
fullpath = malloc(strlen(vpx_codec_lib_dir()) + strlen(file_name) + 2);
if (fullpath == NULL) {
return NULL;
}
strcpy(fullpath, vpx_codec_lib_dir());
strcat(fullpath, "/"); //Will need to be changed for MSVS
strcat(fullpath, file_name);
//printf("Looking in %s\n", fullpath);
f = fopen(fullpath, "rb");
if (f == NULL) {
fprintf(stderr,"Couldn't find CL source at %s or %s\n", file_name, fullpath);
free(fullpath);
return NULL;
}
//printf("Found cl source at %s\n", fullpath);
free(fullpath);
} else {
//printf("Found cl source at %s\n", file_name);
}
fseek(f, 0, SEEK_END);
pos = ftell(f);
fseek(f, 0, SEEK_SET);
bytes = malloc(pos+1);
if (bytes == NULL) {
fclose(f);
return NULL;
}
amt_read = fread(bytes, pos, 1, f);
if (amt_read != 1) {
free(bytes);
fclose(f);
return NULL;
}
bytes[pos] = '\0'; //null terminate the source string
fclose(f);
return bytes;
}
void show_build_log(cl_program *prog_ref){
size_t len;
char *buffer;
int err = clGetProgramBuildInfo(*prog_ref, cl_data.device_id, CL_PROGRAM_BUILD_LOG, 0, NULL, &len);
if (err != CL_SUCCESS){
printf("Error: Could not get length of CL build log\n");
}
buffer = (char*) malloc(len);
if (buffer == NULL) {
printf("Error: Couldn't allocate compile output buffer memory\n");
}
err = clGetProgramBuildInfo(*prog_ref, cl_data.device_id, CL_PROGRAM_BUILD_LOG, len, buffer, NULL);
if (err != CL_SUCCESS) {
printf("Error: Could not get CL build log\n");
} else {
printf("Compile output: %s\n", buffer);
}
free(buffer);
}
int cl_load_program(cl_program *prog_ref, const char *file_name, const char *opts) {
int err;
char *kernel_src = cl_read_file(file_name);
*prog_ref = NULL;
if (kernel_src != NULL) {
*prog_ref = clCreateProgramWithSource(cl_data.context, 1, (const char**)&kernel_src, NULL, &err);
free(kernel_src);
} else {
cl_destroy(NULL, VP8_CL_TRIED_BUT_FAILED);
printf("Couldn't find OpenCL source files. \nUsing software path.\n");
return VP8_CL_TRIED_BUT_FAILED;
}
if (*prog_ref == NULL) {
printf("Error: Couldn't create program\n");
return VP8_CL_TRIED_BUT_FAILED;
}
if (err != CL_SUCCESS) {
printf("Error creating program: %d\n", err);
}
/* Build the program executable */
err = clBuildProgram(*prog_ref, 0, NULL, opts, NULL, NULL);
if (err != CL_SUCCESS) {
printf("Error: Failed to build program executable for %s!\n", file_name);
show_build_log(prog_ref);
return VP8_CL_TRIED_BUT_FAILED;
}
return CL_SUCCESS;
}

View File

@@ -1,192 +0,0 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef VP8_OPENCL_H
#define VP8_OPENCL_H
#ifdef __cplusplus
extern "C" {
#endif
#include "../../../vpx_config.h"
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
#if HAVE_DLOPEN
#include "dynamic_cl.h"
#endif
#define ENABLE_CL_IDCT_DEQUANT 0
#define ENABLE_CL_SUBPIXEL 1
#define TWO_PASS_SIXTAP 0
#define MEM_COPY_KERNEL 1
#define ONE_CQ_PER_MB 1 //Value of 0 is racey... still experimental.
#define ENABLE_CL_LOOPFILTER 0
extern char *cl_read_file(const char* file_name);
extern int cl_common_init();
extern void cl_destroy(cl_command_queue cq, int new_status);
extern int cl_load_program(cl_program *prog_ref, const char *file_name, const char *opts);
#define MAX_NUM_PLATFORMS 4
#define MAX_NUM_DEVICES 10
#define VP8_CL_TRIED_BUT_FAILED 1
#define VP8_CL_NOT_INITIALIZED -1
extern int cl_initialized;
extern const char *vpx_codec_lib_dir(void);
#define VP8_CL_FINISH(cq) \
if (cl_initialized == CL_SUCCESS){ \
/* Wait for kernels to finish. */ \
clFinish(cq); \
}
#define VP8_CL_BARRIER(cq) \
if (cl_initialized == CL_SUCCESS){ \
/* Insert a barrier into the command queue. */ \
clEnqueueBarrier(cq); \
}
#define VP8_CL_CHECK_SUCCESS(cq,cond,msg,alt,retCode) \
if ( cond ){ \
fprintf(stderr, msg); \
cl_destroy(cq, VP8_CL_TRIED_BUT_FAILED); \
alt; \
return retCode; \
}
#define VP8_CL_CALC_LOCAL_SIZE(kernel, kernel_size) \
err = clGetKernelWorkGroupInfo( cl_data.kernel, \
cl_data.device_id, \
CL_KERNEL_WORK_GROUP_SIZE, \
sizeof(size_t), \
&cl_data.kernel_size, \
NULL);\
VP8_CL_CHECK_SUCCESS(NULL, err != CL_SUCCESS, \
"Error: Failed to calculate local size of kernel!\n", \
,\
VP8_CL_TRIED_BUT_FAILED \
); \
#define VP8_CL_CREATE_KERNEL(data,program,name,str_name) \
data.name = clCreateKernel(data.program, str_name , &err); \
VP8_CL_CHECK_SUCCESS(NULL, err != CL_SUCCESS || !data.name, \
"Error: Failed to create compute kernel "#str_name"!\n", \
,\
VP8_CL_TRIED_BUT_FAILED \
);
#define VP8_CL_READ_BUF(cq, bufRef, bufSize, dstPtr) \
err = clEnqueueReadBuffer(cq, bufRef, CL_FALSE, 0, bufSize , dstPtr, 0, NULL, NULL); \
VP8_CL_CHECK_SUCCESS( cq, err != CL_SUCCESS, \
"Error: Failed to read from GPU!\n",, err \
); \
#define VP8_CL_SET_BUF(cq, bufRef, bufSize, dataPtr, altPath, retCode) \
{ \
err = clEnqueueWriteBuffer(cq, bufRef, CL_FALSE, 0, \
bufSize, dataPtr, 0, NULL, NULL); \
\
VP8_CL_CHECK_SUCCESS(cq, err != CL_SUCCESS, \
"Error: Failed to write to buffer!\n", \
altPath, retCode\
); \
} \
#define VP8_CL_CREATE_BUF(cq, bufRef, bufType, bufSize, dataPtr, altPath, retCode) \
bufRef = clCreateBuffer(cl_data.context, CL_MEM_READ_WRITE, bufSize, NULL, NULL); \
if (dataPtr != NULL && bufRef != NULL){ \
VP8_CL_SET_BUF(cq, bufRef, bufSize, dataPtr, altPath, retCode)\
} \
VP8_CL_CHECK_SUCCESS(cq, !bufRef, \
"Error: Failed to allocate buffer. Using CPU path!\n", \
altPath, retCode\
); \
#define VP8_CL_RELEASE_KERNEL(kernel) \
if (kernel) \
clReleaseKernel(kernel); \
kernel = NULL;
typedef struct VP8_COMMON_CL {
cl_device_id device_id; // compute device id
cl_context context; // compute context
//cl_command_queue commands; // compute command queue
cl_program filter_program; // compute program for subpixel/bilinear filters
cl_kernel vp8_sixtap_predict_kernel;
size_t vp8_sixtap_predict_kernel_size;
cl_kernel vp8_sixtap_predict8x4_kernel;
size_t vp8_sixtap_predict8x4_kernel_size;
cl_kernel vp8_sixtap_predict8x8_kernel;
size_t vp8_sixtap_predict8x8_kernel_size;
cl_kernel vp8_sixtap_predict16x16_kernel;
size_t vp8_sixtap_predict16x16_kernel_size;
cl_kernel vp8_bilinear_predict4x4_kernel;
cl_kernel vp8_bilinear_predict8x4_kernel;
cl_kernel vp8_bilinear_predict8x8_kernel;
cl_kernel vp8_bilinear_predict16x16_kernel;
cl_kernel vp8_filter_block2d_first_pass_kernel;
size_t vp8_filter_block2d_first_pass_kernel_size;
cl_kernel vp8_filter_block2d_second_pass_kernel;
size_t vp8_filter_block2d_second_pass_kernel_size;
cl_kernel vp8_filter_block2d_bil_first_pass_kernel;
size_t vp8_filter_block2d_bil_first_pass_kernel_size;
cl_kernel vp8_filter_block2d_bil_second_pass_kernel;
size_t vp8_filter_block2d_bil_second_pass_kernel_size;
cl_kernel vp8_memcpy_kernel;
size_t vp8_memcpy_kernel_size;
cl_kernel vp8_memset_short_kernel;
cl_program idct_program;
cl_kernel vp8_short_inv_walsh4x4_1_kernel;
cl_kernel vp8_short_inv_walsh4x4_1st_pass_kernel;
cl_kernel vp8_short_inv_walsh4x4_2nd_pass_kernel;
cl_kernel vp8_dc_only_idct_add_kernel;
//Note that the following 2 kernels are encoder-only. Not used in decoder.
cl_kernel vp8_short_idct4x4llm_1_kernel;
cl_kernel vp8_short_idct4x4llm_kernel;
cl_program loop_filter_program;
cl_kernel vp8_loop_filter_horizontal_edge_kernel;
cl_kernel vp8_loop_filter_vertical_edge_kernel;
cl_kernel vp8_mbloop_filter_horizontal_edge_kernel;
cl_kernel vp8_mbloop_filter_vertical_edge_kernel;
cl_kernel vp8_loop_filter_simple_horizontal_edge_kernel;
cl_kernel vp8_loop_filter_simple_vertical_edge_kernel;
cl_program dequant_program;
cl_kernel vp8_dequant_dc_idct_add_kernel;
cl_kernel vp8_dequant_idct_add_kernel;
cl_kernel vp8_dequantize_b_kernel;
cl_int cl_decode_initialized;
cl_int cl_encode_initialized;
} VP8_COMMON_CL;
extern VP8_COMMON_CL cl_data;
#ifdef __cplusplus
}
#endif
#endif /* VP8_OPENCL_H */

View File

@@ -804,11 +804,14 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
for (j = 0; j < mb_cols; j++)
{
char zz[4];
int dc_diff = !(mi[mb_index].mbmi.mode != B_PRED &&
mi[mb_index].mbmi.mode != SPLITMV &&
mi[mb_index].mbmi.mb_skip_coeff);
if (oci->frame_type == KEY_FRAME)
sprintf(zz, "a");
else
sprintf(zz, "%c", mi[mb_index].mbmi.dc_diff + '0');
sprintf(zz, "%c", dc_diff + '0');
vp8_blit_text(zz, y_ptr, post->y_stride);
mb_index ++;
@@ -834,7 +837,6 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
YV12_BUFFER_CONFIG *post = &oci->post_proc_buffer;
int width = post->y_width;
int height = post->y_height;
int mb_cols = width >> 4;
unsigned char *y_buffer = oci->post_proc_buffer.y_buffer;
int y_stride = oci->post_proc_buffer.y_stride;
MODE_INFO *mi = oci->mi;
@@ -858,7 +860,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
{
case 0 : /* mv_top_bottom */
{
B_MODE_INFO *bmi = &mi->bmi[0];
union b_mode_info *bmi = &mi->bmi[0];
MV *mv = &bmi->mv.as_mv;
x1 = x0 + 8 + (mv->col >> 3);
@@ -879,7 +881,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
}
case 1 : /* mv_left_right */
{
B_MODE_INFO *bmi = &mi->bmi[0];
union b_mode_info *bmi = &mi->bmi[0];
MV *mv = &bmi->mv.as_mv;
x1 = x0 + 4 + (mv->col >> 3);
@@ -900,7 +902,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
}
case 2 : /* mv_quarters */
{
B_MODE_INFO *bmi = &mi->bmi[0];
union b_mode_info *bmi = &mi->bmi[0];
MV *mv = &bmi->mv.as_mv;
x1 = x0 + 4 + (mv->col >> 3);
@@ -936,7 +938,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
}
default :
{
B_MODE_INFO *bmi = mi->bmi;
union b_mode_info *bmi = mi->bmi;
int bx0, by0;
for (by0 = y0; by0 < (y0+16); by0 += 4)
@@ -1009,7 +1011,7 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
{
int by, bx;
unsigned char *yl, *ul, *vl;
B_MODE_INFO *bmi = mi->bmi;
union b_mode_info *bmi = mi->bmi;
yl = y_ptr + x;
ul = u_ptr + (x>>1);
@@ -1022,9 +1024,9 @@ int vp8_post_proc_frame(VP8_COMMON *oci, YV12_BUFFER_CONFIG *dest, vp8_ppflags_t
if ((ppflags->display_b_modes_flag & (1<<mi->mbmi.mode))
|| (ppflags->display_mb_modes_flag & B_PRED))
{
Y = B_PREDICTION_MODE_colors[bmi->mode][0];
U = B_PREDICTION_MODE_colors[bmi->mode][1];
V = B_PREDICTION_MODE_colors[bmi->mode][2];
Y = B_PREDICTION_MODE_colors[bmi->as_mode][0];
U = B_PREDICTION_MODE_colors[bmi->as_mode][1];
V = B_PREDICTION_MODE_colors[bmi->as_mode][2];
POSTPROC_INVOKE(RTCD_VTABLE(oci), blend_b)
(yl+bx, ul+(bx>>1), vl+(bx>>1), Y, U, V, 0xc000, y_stride);

View File

@@ -53,9 +53,8 @@ loop_filter_function_s_ppc loop_filter_simple_vertical_edge_ppc;
// Horizontal MB filtering
void loop_filter_mbh_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
mbloop_filter_horizontal_edge_y_ppc(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr);
if (u_ptr)
@@ -63,9 +62,8 @@ void loop_filter_mbh_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned ch
}
void loop_filter_mbhs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
@@ -74,9 +72,8 @@ void loop_filter_mbhs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned c
// Vertical MB Filtering
void loop_filter_mbv_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
mbloop_filter_vertical_edge_y_ppc(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->thr);
if (u_ptr)
@@ -84,9 +81,8 @@ void loop_filter_mbv_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned ch
}
void loop_filter_mbvs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
@@ -95,9 +91,8 @@ void loop_filter_mbvs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned c
// Horizontal B Filtering
void loop_filter_bh_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
// These should all be done at once with one call, instead of 3
loop_filter_horizontal_edge_y_ppc(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr);
loop_filter_horizontal_edge_y_ppc(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr);
@@ -108,9 +103,8 @@ void loop_filter_bh_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned cha
}
void loop_filter_bhs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;
@@ -121,9 +115,8 @@ void loop_filter_bhs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned ch
// Vertical B Filtering
void loop_filter_bv_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
loop_filter_vertical_edge_y_ppc(y_ptr, y_stride, lfi->flim, lfi->lim, lfi->thr);
if (u_ptr)
@@ -131,9 +124,8 @@ void loop_filter_bv_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned cha
}
void loop_filter_bvs_ppc(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void)simpler_lpf;
(void)u_ptr;
(void)v_ptr;
(void)uv_stride;

View File

@@ -66,7 +66,6 @@ int vp8_dc2quant(int QIndex, int Delta)
return retval;
}
int vp8_dc_uv_quant(int QIndex, int Delta)
{
int retval;
@@ -117,7 +116,6 @@ int vp8_ac2quant(int QIndex, int Delta)
return retval;
}
int vp8_ac_uv_quant(int QIndex, int Delta)
{
int retval;

View File

@@ -110,19 +110,19 @@ void vp8_recon_mby_c(const vp8_recon_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
#if ARCH_ARM
BLOCKD *b = &x->block[0];
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
/*b = &x->block[4];*/
b += 4;
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
/*b = &x->block[8];*/
b += 4;
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
/*b = &x->block[12];*/
b += 4;
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
#else
int i;
@@ -130,7 +130,7 @@ void vp8_recon_mby_c(const vp8_recon_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
BLOCKD *b = &x->block[i];
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
}
#endif
}
@@ -140,27 +140,27 @@ void vp8_recon_mb_c(const vp8_recon_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
#if ARCH_ARM
BLOCKD *b = &x->block[0];
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b += 4;
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b += 4;
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b += 4;
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b += 4;
/*b = &x->block[16];*/
RECON_INVOKE(rtcd, recon2)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon2)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b++;
b++;
RECON_INVOKE(rtcd, recon2)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon2)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b++;
b++;
RECON_INVOKE(rtcd, recon2)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon2)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
b++;
b++;
RECON_INVOKE(rtcd, recon2)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon2)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
#else
int i;
@@ -168,14 +168,14 @@ void vp8_recon_mb_c(const vp8_recon_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
{
BLOCKD *b = &x->block[i];
RECON_INVOKE(rtcd, recon4)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon4)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
}
for (i = 16; i < 24; i += 2)
{
BLOCKD *b = &x->block[i];
RECON_INVOKE(rtcd, recon2)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon2)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
}
#endif
}

View File

@@ -26,6 +26,9 @@
#define prototype_build_intra_predictors(sym) \
void sym(MACROBLOCKD *x)
#define prototype_intra4x4_predict(sym) \
void sym(BLOCKD *x, int b_mode, unsigned char *predictor)
struct vp8_recon_rtcd_vtable;
#if ARCH_X86 || ARCH_X86_64
@@ -88,11 +91,30 @@ extern prototype_build_intra_predictors\
extern prototype_build_intra_predictors\
(vp8_recon_build_intra_predictors_mby_s);
#ifndef vp8_recon_build_intra_predictors_mbuv
#define vp8_recon_build_intra_predictors_mbuv vp8_build_intra_predictors_mbuv
#endif
extern prototype_build_intra_predictors\
(vp8_recon_build_intra_predictors_mbuv);
#ifndef vp8_recon_build_intra_predictors_mbuv_s
#define vp8_recon_build_intra_predictors_mbuv_s vp8_build_intra_predictors_mbuv_s
#endif
extern prototype_build_intra_predictors\
(vp8_recon_build_intra_predictors_mbuv_s);
#ifndef vp8_recon_intra4x4_predict
#define vp8_recon_intra4x4_predict vp8_intra4x4_predict
#endif
extern prototype_intra4x4_predict\
(vp8_recon_intra4x4_predict);
typedef prototype_copy_block((*vp8_copy_block_fn_t));
typedef prototype_recon_block((*vp8_recon_fn_t));
typedef prototype_recon_macroblock((*vp8_recon_mb_fn_t));
typedef prototype_build_intra_predictors((*vp8_build_intra_pred_fn_t));
typedef prototype_intra4x4_predict((*vp8_intra4x4_pred_fn_t));
typedef struct vp8_recon_rtcd_vtable
{
vp8_copy_block_fn_t copy16x16;
@@ -105,6 +127,9 @@ typedef struct vp8_recon_rtcd_vtable
vp8_recon_mb_fn_t recon_mby;
vp8_build_intra_pred_fn_t build_intra_predictors_mby_s;
vp8_build_intra_pred_fn_t build_intra_predictors_mby;
vp8_build_intra_pred_fn_t build_intra_predictors_mbuv_s;
vp8_build_intra_pred_fn_t build_intra_predictors_mbuv;
vp8_intra4x4_pred_fn_t intra4x4_predict;
} vp8_recon_rtcd_vtable_t;
#if CONFIG_RUNTIME_CPU_DETECT

View File

@@ -8,7 +8,9 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_ports/config.h"
#include "vpx/vpx_integer.h"
#include "recon.h"
#include "subpixel.h"
#include "blockd.h"
@@ -17,22 +19,10 @@
#include "onyxc_int.h"
#endif
#if CONFIG_OPENCL
#include "opencl/vp8_opencl.h"
#include "opencl/filter_cl.h"
#include "opencl/reconinter_cl.h"
#endif
/* use this define on systems where unaligned int reads and writes are
* not allowed, i.e. ARM architectures
*/
/*#define MUST_BE_ALIGNED*/
static const int bbb[4] = {0, 2, 8, 10};
//Copy 16 x 16-bytes from src to dst.
void vp8_copy_mem16x16_c(
unsigned char *src,
int src_stride,
@@ -42,12 +32,9 @@ void vp8_copy_mem16x16_c(
int r;
//Set this up as a 2D kernel. Each loop iteration is X, each byte/int within
//is the Y address.
for (r = 0; r < 16; r++)
{
#ifdef MUST_BE_ALIGNED
#if !(CONFIG_FAST_UNALIGNED)
dst[0] = src[0];
dst[1] = src[1];
dst[2] = src[2];
@@ -66,10 +53,10 @@ void vp8_copy_mem16x16_c(
dst[15] = src[15];
#else
((int *)dst)[0] = ((int *)src)[0] ;
((int *)dst)[1] = ((int *)src)[1] ;
((int *)dst)[2] = ((int *)src)[2] ;
((int *)dst)[3] = ((int *)src)[3] ;
((uint32_t *)dst)[0] = ((uint32_t *)src)[0] ;
((uint32_t *)dst)[1] = ((uint32_t *)src)[1] ;
((uint32_t *)dst)[2] = ((uint32_t *)src)[2] ;
((uint32_t *)dst)[3] = ((uint32_t *)src)[3] ;
#endif
src += src_stride;
@@ -79,7 +66,6 @@ void vp8_copy_mem16x16_c(
}
//Copy 8 x 8-bytes
void vp8_copy_mem8x8_c(
unsigned char *src,
int src_stride,
@@ -90,7 +76,7 @@ void vp8_copy_mem8x8_c(
for (r = 0; r < 8; r++)
{
#ifdef MUST_BE_ALIGNED
#if !(CONFIG_FAST_UNALIGNED)
dst[0] = src[0];
dst[1] = src[1];
dst[2] = src[2];
@@ -100,8 +86,8 @@ void vp8_copy_mem8x8_c(
dst[6] = src[6];
dst[7] = src[7];
#else
((int *)dst)[0] = ((int *)src)[0] ;
((int *)dst)[1] = ((int *)src)[1] ;
((uint32_t *)dst)[0] = ((uint32_t *)src)[0] ;
((uint32_t *)dst)[1] = ((uint32_t *)src)[1] ;
#endif
src += src_stride;
dst += dst_stride;
@@ -120,7 +106,7 @@ void vp8_copy_mem8x4_c(
for (r = 0; r < 4; r++)
{
#ifdef MUST_BE_ALIGNED
#if !(CONFIG_FAST_UNALIGNED)
dst[0] = src[0];
dst[1] = src[1];
dst[2] = src[2];
@@ -130,8 +116,8 @@ void vp8_copy_mem8x4_c(
dst[6] = src[6];
dst[7] = src[7];
#else
((int *)dst)[0] = ((int *)src)[0] ;
((int *)dst)[1] = ((int *)src)[1] ;
((uint32_t *)dst)[0] = ((uint32_t *)src)[0] ;
((uint32_t *)dst)[1] = ((uint32_t *)src)[1] ;
#endif
src += src_stride;
dst += dst_stride;
@@ -145,32 +131,34 @@ void vp8_copy_mem8x4_c(
void vp8_build_inter_predictors_b(BLOCKD *d, int pitch, vp8_subpix_fn_t sppf)
{
int r;
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *pred_ptr = d->predictor;
//d->base_pre is the start of the previous frame's y_buffer, u_buffer, or v_buffer
unsigned char *ptr_base = *(d->base_pre);
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
unsigned char *pred_ptr = d->predictor_base + d->predictor_offset;
ptr_base = *(d->base_pre);
if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
{
sppf(ptr_base+ptr_offset, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, pred_ptr, pitch);
ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
sppf(ptr, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, pred_ptr, pitch);
}
else
{
ptr_base += d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
ptr = ptr_base;
for (r = 0; r < 4; r++)
{
#ifdef MUST_BE_ALIGNED
pred_ptr[0] = ptr_base[ptr_offset];
pred_ptr[1] = ptr_base[ptr_offset+1];
pred_ptr[2] = ptr_base[ptr_offset+2];
pred_ptr[3] = ptr_base[ptr_offset+3];
#if !(CONFIG_FAST_UNALIGNED)
pred_ptr[0] = ptr[0];
pred_ptr[1] = ptr[1];
pred_ptr[2] = ptr[2];
pred_ptr[3] = ptr[3];
#else
*(int *)pred_ptr = *(int *)(ptr_base+ptr_offset) ;
*(uint32_t *)pred_ptr = *(uint32_t *)ptr ;
#endif
pred_ptr += pitch;
ptr_offset += d->pre_stride;
ptr += d->pre_stride;
}
}
}
@@ -179,7 +167,7 @@ static void build_inter_predictors4b(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *pred_ptr = d->predictor_base + d->predictor_offset;
unsigned char *pred_ptr = d->predictor;
ptr_base = *(d->base_pre);
ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
@@ -198,7 +186,7 @@ static void build_inter_predictors2b(MACROBLOCKD *x, BLOCKD *d, int pitch)
{
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *pred_ptr = d->predictor_base + d->predictor_offset;
unsigned char *pred_ptr = d->predictor;
ptr_base = *(d->base_pre);
ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
@@ -213,24 +201,13 @@ static void build_inter_predictors2b(MACROBLOCKD *x, BLOCKD *d, int pitch)
}
}
/* Encoder only */
/*encoder only*/
void vp8_build_inter_predictors_mbuv(MACROBLOCKD *x)
{
int i;
#if CONFIG_OPENCL
if ( 0 && cl_initialized == CL_SUCCESS ){
vp8_build_inter_predictors_mbuv_cl(x);
VP8_CL_FINISH(x->cl_commands);
VP8_CL_FINISH(x->block[0].cl_commands);
VP8_CL_FINISH(x->block[16].cl_commands);
VP8_CL_FINISH(x->block[20].cl_commands);
return;
}
#endif
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
if (x->mode_info_context->mbmi.mode != SPLITMV)
{
unsigned char *uptr, *vptr;
unsigned char *upred_ptr = &x->predictor[256];
@@ -247,8 +224,8 @@ void vp8_build_inter_predictors_mbuv(MACROBLOCKD *x)
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict8x8(uptr, pre_stride, mv_col & 7, mv_row & 7, upred_ptr, 8);
x->subpixel_predict8x8(vptr, pre_stride, mv_col & 7, mv_row & 7, vpred_ptr, 8);
x->subpixel_predict8x8(uptr, pre_stride, mv_col & 7, mv_row & 7, upred_ptr, 8);
x->subpixel_predict8x8(vptr, pre_stride, mv_col & 7, mv_row & 7, vpred_ptr, 8);
}
else
{
@@ -275,158 +252,132 @@ void vp8_build_inter_predictors_mbuv(MACROBLOCKD *x)
}
/*encoder only*/
void vp8_build_inter_predictors_mby(MACROBLOCKD *x)
void vp8_build_inter16x16_predictors_mby(MACROBLOCKD *x)
{
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *pred_ptr = x->predictor;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->block[0].pre_stride;
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
ptr_base = x->pre.y_buffer;
ptr = ptr_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *pred_ptr = x->predictor;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->block[0].pre_stride;
x->subpixel_predict16x16(ptr, pre_stride, mv_col & 7, mv_row & 7, pred_ptr, 16);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, pred_ptr, 16);
}
}
ptr_base = x->pre.y_buffer;
ptr = ptr_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
void vp8_build_inter16x16_predictors_mb(MACROBLOCKD *x,
unsigned char *dst_y,
unsigned char *dst_u,
unsigned char *dst_v,
int dst_ystride,
int dst_uvstride)
{
int offset;
unsigned char *ptr;
unsigned char *uptr, *vptr;
if ((mv_row | mv_col) & 7)
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
unsigned char *ptr_base = x->pre.y_buffer;
int pre_stride = x->block[0].pre_stride;
ptr = ptr_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict16x16(ptr, pre_stride, mv_col & 7, mv_row & 7, dst_y, dst_ystride);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, dst_y, dst_ystride);
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
uptr = x->pre.u_buffer + offset;
vptr = x->pre.v_buffer + offset;
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict8x8(uptr, pre_stride, mv_col & 7, mv_row & 7, dst_u, dst_uvstride);
x->subpixel_predict8x8(vptr, pre_stride, mv_col & 7, mv_row & 7, dst_v, dst_uvstride);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy8x8)(uptr, pre_stride, dst_u, dst_uvstride);
RECON_INVOKE(&x->rtcd->recon, copy8x8)(vptr, pre_stride, dst_v, dst_uvstride);
}
}
void vp8_build_inter4x4_predictors_mb(MACROBLOCKD *x)
{
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
x->subpixel_predict16x16(ptr, pre_stride, mv_col & 7, mv_row & 7, pred_ptr, 16);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, pred_ptr, 16);
BLOCKD *d = &x->block[bbb[i]];
build_inter_predictors4b(x, d, 16);
}
}
else
{
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
for (i = 0; i < 16; i += 2)
{
for (i = 0; i < 4; i++)
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
build_inter_predictors2b(x, d0, 16);
else
{
BLOCKD *d = &x->block[bbb[i]];
build_inter_predictors4b(x, d, 16);
vp8_build_inter_predictors_b(d0, 16, x->subpixel_predict);
vp8_build_inter_predictors_b(d1, 16, x->subpixel_predict);
}
}
}
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
build_inter_predictors2b(x, d0, 8);
else
{
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
build_inter_predictors2b(x, d0, 16);
else
{
vp8_build_inter_predictors_b(d0, 16, x->subpixel_predict);
vp8_build_inter_predictors_b(d1, 16, x->subpixel_predict);
}
}
vp8_build_inter_predictors_b(d0, 8, x->subpixel_predict);
vp8_build_inter_predictors_b(d1, 8, x->subpixel_predict);
}
}
}
void vp8_build_inter_predictors_mb(MACROBLOCKD *x)
{
if (x->mode_info_context->mbmi.ref_frame != INTRA_FRAME &&
x->mode_info_context->mbmi.mode != SPLITMV)
if (x->mode_info_context->mbmi.mode != SPLITMV)
{
int offset;
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *uptr, *vptr;
unsigned char *pred_ptr = x->predictor;
unsigned char *upred_ptr = &x->predictor[256];
unsigned char *vpred_ptr = &x->predictor[320];
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->block[0].pre_stride;
ptr_base = x->pre.y_buffer;
ptr = ptr_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict16x16(ptr, pre_stride, mv_col & 7, mv_row & 7, pred_ptr, 16);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, pred_ptr, 16);
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
uptr = x->pre.u_buffer + offset;
vptr = x->pre.v_buffer + offset;
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict8x8(uptr, pre_stride, mv_col & 7, mv_row & 7, upred_ptr, 8);
x->subpixel_predict8x8(vptr, pre_stride, mv_col & 7, mv_row & 7, vpred_ptr, 8);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy8x8)(uptr, pre_stride, upred_ptr, 8);
RECON_INVOKE(&x->rtcd->recon, copy8x8)(vptr, pre_stride, vpred_ptr, 8);
}
vp8_build_inter16x16_predictors_mb(x, x->predictor, &x->predictor[256],
&x->predictor[320], 16, 8);
}
else
{
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
BLOCKD *d = &x->block[bbb[i]];
build_inter_predictors4b(x, d, 16);
}
}
else
{
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
build_inter_predictors2b(x, d0, 16);
else
{
vp8_build_inter_predictors_b(d0, 16, x->subpixel_predict);
vp8_build_inter_predictors_b(d1, 16, x->subpixel_predict);
}
}
}
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
build_inter_predictors2b(x, d0, 8);
else
{
vp8_build_inter_predictors_b(d0, 8, x->subpixel_predict);
vp8_build_inter_predictors_b(d1, 8, x->subpixel_predict);
}
}
vp8_build_inter4x4_predictors_mb(x);
}
}
@@ -510,202 +461,5 @@ void vp8_build_uvmvs(MACROBLOCKD *x, int fullpixel)
}
/* The following functions are written for skip_recon_mb() to call. Since there is no recon in this
* situation, we can write the result directly to dst buffer instead of writing it to predictor
* buffer and then copying it to dst buffer.
*/
static void vp8_build_inter_predictors_b_s(BLOCKD *d, unsigned char *dst_ptr, vp8_subpix_fn_t sppf)
{
int r;
unsigned char *ptr_base;
unsigned char *ptr;
/*unsigned char *pred_ptr = d->predictor_base + d->predictor_offset;*/
int dst_stride = d->dst_stride;
int pre_stride = d->pre_stride;
int ptr_offset = d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
ptr_base = *(d->base_pre);
ptr = ptr_base + ptr_offset;
if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
{
sppf(ptr, pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_ptr, dst_stride);
}
else
{
for (r = 0; r < 4; r++)
{
#ifdef MUST_BE_ALIGNED
dst_ptr[0] = ptr[0];
dst_ptr[1] = ptr[1];
dst_ptr[2] = ptr[2];
dst_ptr[3] = ptr[3];
#else
*(int *)dst_ptr = *(int *)ptr ;
#endif
dst_ptr += dst_stride;
ptr += pre_stride;
}
}
}
void vp8_build_inter_predictors_mb_s(MACROBLOCKD *x)
{
unsigned char *dst_ptr = x->dst.y_buffer;
#if CONFIG_OPENCL && ENABLE_CL_SUBPIXEL
if (cl_initialized == CL_SUCCESS){
vp8_build_inter_predictors_mb_s_cl(x);
return;
}
#endif
if (x->mode_info_context->mbmi.mode != SPLITMV)
{
int offset;
unsigned char *ptr_base;
unsigned char *ptr;
unsigned char *uptr, *vptr;
/*unsigned char *pred_ptr = x->predictor;
unsigned char *upred_ptr = &x->predictor[256];
unsigned char *vpred_ptr = &x->predictor[320];*/
unsigned char *udst_ptr = x->dst.u_buffer;
unsigned char *vdst_ptr = x->dst.v_buffer;
int mv_row = x->mode_info_context->mbmi.mv.as_mv.row;
int mv_col = x->mode_info_context->mbmi.mv.as_mv.col;
int pre_stride = x->dst.y_stride; /*x->block[0].pre_stride;*/
ptr_base = x->pre.y_buffer;
ptr = ptr_base + (mv_row >> 3) * pre_stride + (mv_col >> 3);
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict16x16(ptr, pre_stride, mv_col & 7, mv_row & 7, dst_ptr, x->dst.y_stride); /*x->block[0].dst_stride);*/
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy16x16)(ptr, pre_stride, dst_ptr, x->dst.y_stride); /*x->block[0].dst_stride);*/
}
mv_row = x->block[16].bmi.mv.as_mv.row;
mv_col = x->block[16].bmi.mv.as_mv.col;
pre_stride >>= 1;
offset = (mv_row >> 3) * pre_stride + (mv_col >> 3);
uptr = x->pre.u_buffer + offset;
vptr = x->pre.v_buffer + offset;
if ((mv_row | mv_col) & 7)
{
x->subpixel_predict8x8(uptr, pre_stride, mv_col & 7, mv_row & 7, udst_ptr, x->dst.uv_stride);
x->subpixel_predict8x8(vptr, pre_stride, mv_col & 7, mv_row & 7, vdst_ptr, x->dst.uv_stride);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy8x8)(uptr, pre_stride, udst_ptr, x->dst.uv_stride);
RECON_INVOKE(&x->rtcd->recon, copy8x8)(vptr, pre_stride, vdst_ptr, x->dst.uv_stride);
}
}
else
{
/* note: this whole ELSE part is not executed at all. So, no way to test the correctness of my modification. Later,
* if sth is wrong, go back to what it is in build_inter_predictors_mb.
*
* ACW: note: Not sure who the above comment belongs to.
*/
int i;
if (x->mode_info_context->mbmi.partitioning < 3)
{
for (i = 0; i < 4; i++)
{
BLOCKD *d = &x->block[bbb[i]];
/*build_inter_predictors4b(x, d, 16);*/
{
unsigned char *ptr_base;
unsigned char *ptr;
ptr_base = *(d->base_pre);
ptr = ptr_base + d->pre + (d->bmi.mv.as_mv.row >> 3) * d->pre_stride + (d->bmi.mv.as_mv.col >> 3);
if (d->bmi.mv.as_mv.row & 7 || d->bmi.mv.as_mv.col & 7)
{
x->subpixel_predict8x8(ptr, d->pre_stride, d->bmi.mv.as_mv.col & 7, d->bmi.mv.as_mv.row & 7, dst_ptr, x->dst.y_stride); /*x->block[0].dst_stride);*/
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy8x8)(ptr, d->pre_stride, dst_ptr, x->dst.y_stride); /*x->block[0].dst_stride);*/
}
}
}
}
else
{
for (i = 0; i < 16; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
{
/*build_inter_predictors2b(x, d0, 16);*/
unsigned char *ptr_base;
unsigned char *ptr;
ptr_base = *(d0->base_pre);
ptr = ptr_base + d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
if (d0->bmi.mv.as_mv.row & 7 || d0->bmi.mv.as_mv.col & 7)
{
x->subpixel_predict8x4(ptr, d0->pre_stride, d0->bmi.mv.as_mv.col & 7, d0->bmi.mv.as_mv.row & 7, dst_ptr, x->dst.y_stride);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy8x4)(ptr, d0->pre_stride, dst_ptr, x->dst.y_stride);
}
}
else
{
vp8_build_inter_predictors_b_s(d0, dst_ptr, x->subpixel_predict);
vp8_build_inter_predictors_b_s(d1, dst_ptr, x->subpixel_predict);
}
}
}
for (i = 16; i < 24; i += 2)
{
BLOCKD *d0 = &x->block[i];
BLOCKD *d1 = &x->block[i+1];
if (d0->bmi.mv.as_int == d1->bmi.mv.as_int)
{
/*build_inter_predictors2b(x, d0, 8);*/
unsigned char *ptr_base;
unsigned char *ptr;
ptr_base = *(d0->base_pre);
ptr = ptr_base + d0->pre + (d0->bmi.mv.as_mv.row >> 3) * d0->pre_stride + (d0->bmi.mv.as_mv.col >> 3);
if (d0->bmi.mv.as_mv.row & 7 || d0->bmi.mv.as_mv.col & 7)
{
x->subpixel_predict8x4(ptr, d0->pre_stride,
d0->bmi.mv.as_mv.col & 7,
d0->bmi.mv.as_mv.row & 7,
dst_ptr, x->dst.uv_stride);
}
else
{
RECON_INVOKE(&x->rtcd->recon, copy8x4)(ptr,
d0->pre_stride, dst_ptr, x->dst.uv_stride);
}
}
else
{
vp8_build_inter_predictors_b_s(d0, dst_ptr, x->subpixel_predict);
vp8_build_inter_predictors_b_s(d1, dst_ptr, x->subpixel_predict);
}
}
}
}

View File

@@ -13,9 +13,15 @@
#define __INC_RECONINTER_H
extern void vp8_build_inter_predictors_mb(MACROBLOCKD *x);
extern void vp8_build_inter_predictors_mb_s(MACROBLOCKD *x);
extern void vp8_build_inter16x16_predictors_mb(MACROBLOCKD *x,
unsigned char *dst_y,
unsigned char *dst_u,
unsigned char *dst_v,
int dst_ystride,
int dst_uvstride);
extern void vp8_build_inter_predictors_mby(MACROBLOCKD *x);
extern void vp8_build_inter16x16_predictors_mby(MACROBLOCKD *x);
extern void vp8_build_uvmvs(MACROBLOCKD *x, int fullpixel);
extern void vp8_build_inter_predictors_b(BLOCKD *d, int pitch, vp8_subpix_fn_t sppf);
extern void vp8_build_inter_predictors_mbuv(MACROBLOCKD *x);

View File

@@ -24,7 +24,7 @@ void vp8_recon_intra_mbuv(const vp8_recon_rtcd_vtable_t *rtcd, MACROBLOCKD *x)
for (i = 16; i < 24; i += 2)
{
BLOCKD *b = &x->block[i];
RECON_INVOKE(rtcd, recon2)(b->predictor_base + b->predictor_offset, &b->diff_base[b->diff_offset], *(b->base_dst) + b->dst, b->dst_stride);
RECON_INVOKE(rtcd, recon2)(b->predictor, b->diff, *(b->base_dst) + b->dst, b->dst_stride);
}
}

Some files were not shown because too many files have changed in this diff Show More