Compare commits

...

838 Commits

Author SHA1 Message Date
John Koleszar
20307c70ae Update CHANGELOG for v0.9.7-p1
Change-Id: I5490a9cad2d6752832b6bf4ec1835c06a45eeb9b
2011-08-15 17:02:45 -04:00
Stefan Holmer
99d870a472 Don't set the bmi mode when doing error concealment
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.

Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
2011-08-15 11:46:04 -04:00
John Koleszar
5e562c77db Generate libvpx_srcs.txt from current configuration
To get a list of files that the libvpx library depends on in the current
configuration, run:

  $ make target=libs libvpx_srcs.txt

Change-Id: I68a69648ecf212f0fe29c325297728ac2a9393d9
2011-08-12 14:59:22 -04:00
John Koleszar
e96131705a Revert "Improved 1-pass CBR rate control"
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.

Conflicts:

	vp8/encoder/onyx_if.c
	vp8/encoder/ratectrl.c

Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
2011-08-12 14:51:36 -04:00
John Koleszar
a4c2211ea3 Propagate macroblock MV to subblocks for error concealment
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.

Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
2011-08-12 14:49:35 -04:00
Stefan Holmer
a609be5633 Disable error concealment until first key frame is decoded
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.

Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.

Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
2011-08-12 14:49:34 -04:00
John Koleszar
cdae03a4eb Fix potential OOB read with Error Concealment
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.

Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
2011-08-12 14:49:34 -04:00
John Koleszar
e28e08146e Merge "Update CHANGELOG for Cayuga release" into cayuga 2011-08-04 10:30:15 -07:00
John Koleszar
f3538f2b81 Merge changes Ic7725e27,Ib3d54bfa into cayuga
* changes:
  Update AUTHORS
  Update .mailmap entry for Ralph Giles
2011-08-03 13:45:24 -07:00
John Koleszar
a49b9e0014 Merge changes I585167e1,Ia07602bd into cayuga
* changes:
  Fix building of static libs on universal-darwin
  Fix asm offsets generation for universal-darwin builds
2011-08-03 13:44:32 -07:00
John Koleszar
238dae8604 Fix source buffer selection
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.

Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
2011-08-03 16:13:15 -04:00
John Koleszar
06f58c0df7 Fix building of static libs on universal-darwin
The static libs should not be built from sources during the top level
of a universal build. This regression was introduced in commit
495b241fa6, which made the static
libs selectable under CONFIG_STATIC.

Change-Id: I585167e17459877e0fa7fa19e1046c3703d91c97
2011-08-03 10:38:45 -04:00
John Koleszar
c1bf6ca6cc Fix asm offsets generation for universal-darwin builds
Added BUILD_PFX to correct dependencies.

Change-Id: Ia07602bd98ef2253242b1bd66ef05e3b1e64ba7d
2011-08-03 10:38:33 -04:00
Johann
30e5deae5d update extend frame borders
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676

update the code with new assumptions and guard them with a compile time
assert

Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
2011-08-02 19:26:46 -04:00
John Koleszar
ea8d436f30 Update CHANGELOG for Cayuga release
Change-Id: If6f20553159105c05f9a684cb7c8f3778c7894a1
2011-08-02 14:43:05 -04:00
James Berry
27ee521753 include asm_com/dec_offsets for make dist
Change-Id: Ia1ad66066a24c01915cd9e3ff75c7e070cc984c8
2011-08-02 13:42:03 -04:00
John Koleszar
b956f2ceb2 Update AUTHORS
Change-Id: Ic7725e279d2263515e5312c152c58e1644eb2495
2011-08-02 10:09:59 -04:00
John Koleszar
e6847aa0f0 Update .mailmap entry for Ralph Giles
Change-Id: Ib3d54bfa81720a0b2877837d7149cd12d26e75e4
2011-08-02 10:09:36 -04:00
Lou Quillio
edfed938ba Sync vpxenc --timebase usage wording with docs change.
Change-Id: Ia406272a97806c0194435bb7f24e24d353ef5cc6
2011-08-02 09:57:50 -04:00
John Koleszar
f475f0c1bb Merge "include the arm header files in make dist" into cayuga 2011-08-02 05:21:10 -07:00
John Koleszar
81da41732c Merge "Fix building with --disable-postproc" into cayuga 2011-08-02 05:19:12 -07:00
John Koleszar
06c3d5bb9a Fix building with --disable-postproc
Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
2011-08-01 17:50:23 -04:00
Johann
3e8c6d3d35 include the arm header files in make dist
Change-Id: Ibcf5b4b14153f65ce1b53c3bfba87ad2feb17bbd
2011-08-01 17:20:21 -04:00
John Koleszar
b8791980b4 Merge "build error fix - obj_int_extract.bat" into cayuga 2011-08-01 13:56:32 -07:00
James Berry
61046b8d7a build error fix - obj_int_extract.bat
obj_int_extract.bat was not being copied
correctly for make dist. It now is.

Change-Id: I976479f90bbfa4798f241db1055e1e3b04ca2830
2011-08-01 16:55:06 -04:00
John Koleszar
7d984d8c38 Disable FORTIFY_SOURCE on glibc targets
Improve binary distributions by defeating longjmp interception. See
http://code.google.com/p/webm/issues/detail?id=166 for more information.

Change-Id: I5ac731ec3f3570088597201d0f411473e2dffa4f
2011-08-01 10:10:43 -04:00
John Koleszar
8ef25de377 install asm_offsets.h
Ensure vpx_ports/asm_offsets.h is installed with make dist

Change-Id: If9f32273fff975d60de1583b039dbbce8a7ccd27
2011-07-29 16:56:43 -04:00
John Koleszar
6f080f9cec Merge "Convert rc_max_intra_bitrate_pct to control" 2011-07-29 11:57:48 -07:00
John Koleszar
1f71d2e2c8 Correctly track sharpness in vp8cx_pick_filter_level_fast
Make sure to update last_sharpness_level from the current
sharpness_level whenever it changes.

Change-Id: I0258d2f5b11a407abf6176a8d4c4994d925943f0
2011-07-29 12:27:03 -04:00
John Koleszar
56b06aef6d Merge "configure: add --enable-static option" 2011-07-28 07:08:35 -07:00
John Koleszar
1654ae9a2a Convert rc_max_intra_bitrate_pct to control
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.

More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.

Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
2011-07-28 09:17:35 -04:00
Yunqing Wang
2f2302f8d5 Preload reference area in sub-pixel motion search (real-time mode)
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):

1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain

2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.

Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
2011-07-27 14:19:10 -04:00
Yunqing Wang
f11613b620 Merge "Fix range checks in motion search" 2011-07-27 09:34:13 -07:00
Yunqing Wang
bde2afbe23 Fix range checks in motion search
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.

Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
2011-07-27 10:37:33 -04:00
James Zern
3a975d9489 vpxenc: cosmetics: timebase help update / spelling
The timebase update fixes Issue #61.

Change-Id: I425158da7ea639464f61e6dd604ac9e6c72b7266
2011-07-26 17:27:01 -07:00
John Koleszar
db8f0d2ca9 Merge "cosmetics: consistently use [u]int64_t" 2011-07-26 12:57:43 -07:00
James Zern
b45065d38b cosmetics: consistently use [u]int64_t
Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.

Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a
2011-07-26 11:34:36 -07:00
Johann
ca7e346669 Merge ""Eliminated TOKENEXTRABITS" broke the windows build." 2011-07-26 06:34:31 -07:00
Scott LaVarnway
a11624497c "Eliminated TOKENEXTRABITS" broke the windows build.
Fixed.

Change-Id: I3348e8dbcaee6ace263af413701101d77636e5df
2011-07-26 09:33:16 -04:00
James Zern
495b241fa6 configure: add --enable-static option
Fixes issue #62.

Change-Id: I0567cf7897c0942666c19b3231c8c3b8e9c3e7cc
2011-07-25 15:40:36 -07:00
Scott LaVarnway
4894b45ced Merge "Eliminated TOKENEXTRABITS" 2011-07-25 14:35:58 -07:00
Scott LaVarnway
76eb402668 Eliminated TOKENEXTRABITS
Noticed small performance gains, depending on material.

Change-Id: I334369f6312bc19aa73481fc3f790ab181e11867
2011-07-25 17:11:24 -04:00
Yunqing Wang
5b0de48ddd Merge "Use CONFIG_FAST_UNALIGNED consistently in codec" 2011-07-25 12:40:50 -07:00
Yunqing Wang
fe270dd527 Specify size for argument pushed to stack
The change fixes building error on Win64.

Change-Id: I63d25b26220c4da8a98ca2e36530cbb802468e6b
2011-07-25 11:30:45 -04:00
Yunqing Wang
65dfcf4696 Use CONFIG_FAST_UNALIGNED consistently in codec
CONFIG_FAST_UNALIGNED is enabled by default. Disable it if it is
not supported by hardware.

Change-Id: I7d6905ed79fed918bca074bd62820b0c929d81ab
2011-07-25 10:11:24 -04:00
Johann
773bcc300d Merge "fix sharpness bug and clean up" 2011-07-22 09:34:55 -07:00
Johann
a04ed0e8f3 fix sharpness bug and clean up
sharpness was not recalculated in vp8cx_pick_filter_level_fast

remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.

always use cm->sharpness_level. the extra indirection was annoying.

don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init

move function declarations to their proper header

Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
2011-07-22 12:33:57 -04:00
Yunqing Wang
829179e888 Merge "Preload reference area to an intermediate buffer in sub-pixel motion search" 2011-07-22 06:56:15 -07:00
Yunqing Wang
20bd1446c0 Preload reference area to an intermediate buffer in sub-pixel motion search
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
2011-07-22 09:28:06 -04:00
Johann
52d13777da Merge "Add .size directive to ARM asm functions." 2011-07-21 12:56:59 -07:00
Johann
ddcdbfd71e Merge "Mark ARM asm objects as allowing a non-executable stack." 2011-07-21 12:20:00 -07:00
Timothy B. Terriberry
1647f00c29 Add .size directive to ARM asm functions.
This makes them show up properly in debugging tools like gdb and
 valgrind.

Change-Id: I0c72548a1090de88ba226314e5efe63360b7e07f
2011-07-21 11:46:14 -07:00
Timothy B. Terriberry
0453aca5af Mark ARM asm objects as allowing a non-executable stack.
This adds the magic .note.GNU-stack section at the end of each ARM
 asm file (when built with gas), indicating that a non-executable
 stack is allowed.
Without this section, the linker will assume the object requires an
 executable stack by default, forcing an executable stack for the
 entire program.

Change-Id: Ie86de6a449b52d392b9e5e0479833ed8c508ee65
2011-07-21 11:45:00 -07:00
John Koleszar
2bdda84e37 Merge "Increase chrow row alignment to 16 bytes." 2011-07-21 07:32:39 -07:00
Yunqing Wang
c5fe641179 Merge "Add improvements made in good-quality mode to real-time mode" 2011-07-21 07:27:09 -07:00
Timothy B. Terriberry
7d1b37cdac Increase chrow row alignment to 16 bytes.
This is done by expanding luma row to 32-byte alignment, since
 there is currently a bunch of code that assumes that
 uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
 common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
 encoder/temporal_filter.c, and possibly others; I haven't done a
 full audit).
It also uses replaces the hardcoded border of 16 in a number of
 encoder buffers with VP8BORDERINPIXELS (currently 32), as the
 chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
 dumping the frame memory as a contiguous blob produces a valid,
 if padded, image.

Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
2011-07-20 10:20:31 -07:00
Attila Nagy
0afcc76971 encoder: don't set the fragment bit for the last partition
Change-Id: Icb4e4f0d7c3074a8507852178be87541a1cb5bac
2011-07-20 14:09:42 +03:00
Scott LaVarnway
b2d9700f53 Merge "Moved vp8_encode_bool into boolhuff.h" 2011-07-19 08:15:14 -07:00
John Koleszar
d98a5ed4dd Revert "Disable __longjmp_chk protection"
This reverts commit b73a3693e5.

This version of the check doesn't work with generic-gnu, and figuring
out the correct symbol version at configure time is probably more work
than this is worth. May revisit in the future.

Change-Id: I6c75e88bd3bd82a4b21e09a25780fe53aacb7d70
2011-07-19 10:00:27 -04:00
Johann
6afafc313c remove old armv5 code
armv5 dequantizer is not referenced

Change-Id: Id1cc617dcee35ebd6a406816ec6aaa26e8bbc8ad
2011-07-19 09:20:38 -04:00
Scott LaVarnway
a25f6a9c88 Moved vp8_encode_bool into boolhuff.h
allowing the compiler to inline this function.  For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.

Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
2011-07-19 09:17:25 -04:00
John Koleszar
b5ea2fbc2c Improved 1-pass CBR rate control
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.

Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.

Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
2011-07-18 11:48:05 -04:00
John Koleszar
74ad25a4c6 Merge "Disable __longjmp_chk protection" 2011-07-18 08:43:59 -07:00
John Koleszar
da39e505dd Merge "Fixed rate histogram calculation" 2011-07-18 06:07:51 -07:00
Tero Rintaluoma
fd41cb8491 Fixed rate histogram calculation
Using small values for --buf-sz= in command line causes
floating point exception due to division by zero.

Change-Id: Ibfe2d44db922993a78ebc9a4a1087d9625de48ae
2011-07-18 10:35:05 +03:00
Scott LaVarnway
e68894fa03 Merge "Tokenize MB optimized" 2011-07-15 07:54:14 -07:00
Yunqing Wang
f676171e52 Merge "Fix vpxenc encoding incorrect webm file header on big endian machines(Issue 331)" 2011-07-15 05:21:35 -07:00
Tero Rintaluoma
4e82f01547 Tokenize MB optimized
Optimized C-code of the following functions:
 - vp8_tokenize_mb
 - tokenize1st_order_b
 - tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.

Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
2011-07-15 11:26:54 +03:00
James Berry
6b6f367c3d bug fix vpx_copy_and_extend_frame size issue
vpx_copy_and_extend_frame could incorrectly
resize uv frames which could result in a crash.

Change-Id: Ie96f7078b1e328b3907a06eebeee44ca39a2e898
2011-07-14 15:58:15 -04:00
John Koleszar
04dce631a2 Remove unused speed features
min_fs_radius, max_fs_radius, full_freq were set but never read.

Change-Id: I82657f4e7f2ba2acc3cbc3faa5ec0de5b9c6ec74
2011-07-14 14:20:25 -04:00
Fritz Koenig
4ab3175b12 Merge "Better allocate yuv buffers." 2011-07-13 14:18:11 -07:00
Yunqing Wang
f1f28535c3 Merge "Fix unnecessary casting of B_PREDICTION_MODE (issue 349)" 2011-07-13 13:32:57 -07:00
John Koleszar
b73a3693e5 Disable __longjmp_chk protection
glibc implements some checking on longjmp() calls by replacing it with
an internal function __longjmp_chk(), when FORTIFY_SOURCE is defined.
This can be problematic when compiling the library under one version of
glibc and running it under another. Work around this issue for the one
symbol affected for now, before taking out the undef hammer.

Fixes http://code.google.com/p/webm/issues/detail?id=166

Change-Id: Ifc5e25cdec17915e394711f2185b3e9214572d10
2011-07-13 16:07:00 -04:00
Yunqing Wang
139577f937 Fix unnecessary casting of B_PREDICTION_MODE (issue 349)
Minor fix.

Change-Id: Iaf93f6e47e882a33c479e57c7a0d0bf321e291c0
2011-07-13 15:52:07 -04:00
Yunqing Wang
0e9a6ed72a Add improvements made in good-quality mode to real-time mode
Several improvements we made in good-quality mode can be added
into real-time mode to speed up encoding in speed 1, 2, and 3
with small quality loss. Tests using tulip clip showed:

--rt --cpu-used=-1
(before change)
PSNR: 38.028
time: 1m33.195s
(after change)
PSNR: 38.014
time: 1m20.851s

--rt --cpu-used=-2
(before change)
PSNR: 37.773
time: 0m57.650s
(after change)
PSNR: 37.759
time: 0m54.594s

--rt --cpu-used=-3
(before change)
PSNR: 37.392
time: 0m42.865s
(after change)
PSNR: 37.375
time: 0m41.949s

Change-Id: I76ab2a38d72bc5efc91f6fe20d332c472f6510c9
2011-07-13 14:51:02 -04:00
Fritz Koenig
e9751d4b74 Better allocate yuv buffers.
Previously allocated more memory than necessary for yuv buffers.
This makes it harder to track bugs with reading uninitialized
data.

Change-Id: I510f7b298d3c647c869be6e5d51608becc63cce9
2011-07-13 10:37:15 -07:00
Fritz Koenig
84c3cd79d1 Merge "Reduce motion vector search on alt-ref frame." 2011-07-13 10:07:30 -07:00
John Koleszar
7f0b11c0ae Merge "Remove rotting NDS_NITRO code." 2011-07-13 05:46:30 -07:00
Johann
211694f67e Merge "update x86 asm for loopfilter" 2011-07-13 04:10:03 -07:00
Johann
8f910594bd Merge "Update armv6 loopfilter to new interface" 2011-07-13 04:09:55 -07:00
Johann
1a219c22b1 Merge "Update armv7 loopfilter to new interface" 2011-07-13 04:09:42 -07:00
Johann
d9b825cff2 Merge "New loop filter interface" 2011-07-13 04:09:26 -07:00
Fritz Koenig
d89eb6ad5a Remove rotting NDS_NITRO code.
Code has not been used and is no longer relevant.

Change-Id: I38590513da7c7a436804ff8a1a3805d9697f575d
2011-07-12 16:29:15 -07:00
Yunqing Wang
c156a68d06 Fix vpxenc encoding incorrect webm file header on big endian machines(Issue 331)
As reported in issue 331, vpxenc encoded incorrect webm file header
on big endian machines. This change fixed that.

Change-Id: I31924ebd476a87f3e88b9b5424540bf781d2b86f
2011-07-12 14:49:57 -04:00
Attila Nagy
c231b0175d Update armv6 loopfilter to new interface
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
2011-07-12 12:14:51 +03:00
Attila Nagy
283b0e25ac Update armv7 loopfilter to new interface
Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346
2011-07-12 12:12:25 +03:00
Fritz Koenig
ede0b15c9d Reduce motion vector search on alt-ref frame.
Clamp mv search to accomodate subpixel filtering
of UV mv.

Change-Id: Iab3ed405993ef6bf779ad7cf60863153068fb7d1
2011-07-11 09:05:43 -07:00
Yunqing Wang
587ca06da9 Minor change in pick_inter_mode()
Scott suggested to move vp8_mv_pred() under "case NEWMV" to save
extra checks.

Change-Id: I09e69892f34a08dd425a4d81cfcc83674e344a20
2011-07-08 14:08:45 -04:00
Yunqing Wang
e83d36c053 Merge "Adjust full-pixel clamping and motion vector limit calculation" 2011-07-08 08:39:32 -07:00
Yunqing Wang
40991faeae Adjust full-pixel clamping and motion vector limit calculation
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b748045470

Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
2011-07-08 11:34:28 -04:00
Johann
01433c5043 update x86 asm for loopfilter
Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
2011-07-08 09:23:38 -04:00
Johann
6ae12c415e Merge "clean up warnings when building arm with rtcd" 2011-07-08 05:16:09 -07:00
Attila Nagy
622958449b New loop filter interface
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.

Change works only with --target=generic-gnu
All other targets have to be updated!

Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
2011-07-08 09:31:41 +03:00
John Koleszar
973a9c075d Merge "Set VPX_FRAME_IS_DROPPABLE" 2011-07-07 08:11:05 -07:00
John Koleszar
37de0b8bdf Set VPX_FRAME_IS_DROPPABLE
Allow the encoder to inform the application that the encoded frame will not
be used as a reference.

Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
2011-07-07 10:38:45 -04:00
John Koleszar
b4f70084cc Merge "Properly use GET_GOT/RESTORE_GOT when using GLOBAL()." 2011-07-01 07:14:34 -07:00
Ronald S. Bultje
c8a23ad3f4 Properly use GET_GOT/RESTORE_GOT when using GLOBAL().
This should fix binaries using PIC on x86-32. Also should
fix issue 343.

Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02
2011-06-30 14:04:27 -07:00
Yunqing Wang
ae8aa836d5 Merge "Copy macroblock data to a buffer before encoding it" 2011-06-30 11:14:24 -07:00
Yunqing Wang
80c3bbf657 Merge "Bug fix in motion vector limit calculation" 2011-06-30 09:52:03 -07:00
Yunqing Wang
b748045470 Bug fix in motion vector limit calculation
Motion vector limits are calculated using right shifts, which
could give wrong results for negative numbers. James Berry's
test on one clip showed encoder produced some artifacts. This
change fixed that.

Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
2011-06-30 11:20:13 -04:00
Johann
3e4a80cc35 Merge "remove incorrect initialization" 2011-06-30 07:59:08 -07:00
John Koleszar
034cea5e72 Merge "guard against space/time distortion" 2011-06-29 11:36:51 -07:00
Johann
bb0ca87a0d guard against space/time distortion
and divide by 0 errors

Change-Id: I8af5ca3d0913cb6f278fff754f8772bcb62e674a
2011-06-29 14:34:25 -04:00
Paul Wilkins
eacaabc592 Merge "Change to arf boost calculation." 2011-06-29 10:03:57 -07:00
Paul Wilkins
11694aab66 Change to arf boost calculation.
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.

The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.

The new code can be deactivated using #define NEW_BOOST 0

In its current default state the code searches forwards and backwards
from the proposed  position of the next alt ref.

The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.

I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.

Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
2011-06-29 18:01:25 +01:00
Johann
fe53107fda remove incorrect initialization
Values were set, then reset. Only set them once.

Change-Id: Iaf43c8467129f2f261f04fa9188b603aa46216b5
2011-06-29 11:54:27 -04:00
Johann
6611f66978 clean up warnings when building arm with rtcd
Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df
2011-06-29 10:51:41 -04:00
John Koleszar
05239f0c41 vpxenc: prevent wraparound in the --rate-hist ringbuffer
For clips that are near 60fps and have a lot of alt refs, it's possible
that the ring buffer holding the previous frames sizes/pts could wrap
around, leading to a division by zero.

In addition to checking for this condition in the ring buffer loop,
the buffer size is made dependent on the actual frame rate in use,
rather than defaulting to 60, which should improve accuracy at frame
rates >= ~60.

Change-Id: If5a04d6e847316dc5f7504f25c01164cf9332be8
2011-06-29 10:30:19 -04:00
John Koleszar
f3a13cb236 Merge "Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently" 2011-06-29 07:29:59 -07:00
Johann
dc004e8c17 Merge "Avoid text relocations in ARM vp8 decoder" 2011-06-28 16:34:10 -07:00
Johann
02c30cdeef Merge "utilize preload in ARMv6 MC/LPF/Copy routines" 2011-06-28 16:33:45 -07:00
Johann
2d29457c4d Merge "respect alignment in arm asm files" 2011-06-28 16:32:48 -07:00
John Koleszar
b32da7c3da Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.

Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
2011-06-28 17:03:55 -04:00
John Koleszar
9bcf07ae4a Merge "Simplify decode_macroblock." 2011-06-28 12:54:25 -07:00
John Koleszar
14566125ea Merge "use relative include path" 2011-06-28 12:52:48 -07:00
James Zern
db6ee54353 vpxenc: free resources
Free buffers allocated for y4m input and webm cue list.

Change-Id: I02051baae3b45f692cf5c7f520ea9a2d80c7b470
2011-06-28 12:10:24 -07:00
Johann
4e4f835232 use relative include path
Files are already in vpx/

Change-Id: I67dcbb5d5b6cb55e91b4e4927ab842a1e2c9e284
2011-06-28 14:46:24 -04:00
Gaute Strokkenes
81c0546407 Simplify decode_macroblock.
Change-Id: Ieb2f3827ae7896ae594203b702b3e8fa8fb63d37
2011-06-28 17:01:14 +01:00
Stefan Holmer
7296b3f922 New ways of passing encoded data between encoder and decoder.
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.

At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.

At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.

Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.

The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.

Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
2011-06-28 11:10:17 -04:00
Stefan Holmer
b433e12a3d Proposing an extension to the encoder and decoder interfaces.
Adding capabilities with which the encoder can output frames
partition by partition, and the decoder can get input data
partition by partition.

Change-Id: Ieae0801480b8de8cd43c3c57dd3bab2e4c346fe0
2011-06-28 11:10:17 -04:00
Stefan Holmer
4cb0ebe5b2 Adding support for independent partitions
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.

Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
2011-06-28 11:10:17 -04:00
Mike Hommey
e3f850ee05 Avoid text relocations in ARM vp8 decoder
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.

Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.

Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
2011-06-28 09:11:40 +02:00
Fritz Koenig
be99868bd1 Fix after removal of B_MODE_INFO
Change Ieb746989: Removed B_MODE_INFO missed this.

Change-Id: I32202555581cc2a5d45e729c6650ada4d2df55d3
2011-06-27 09:43:21 -07:00
Johann
8a9a11e8dc Merge "configuration, support disabling any subset of ARM arch" 2011-06-27 08:55:18 -07:00
Johann
2007c3bb38 respect alignment in arm asm files
Conversion script was discarding alignment. Also, set default alignment
to 4 bytes.

Change-Id: I1e9cbbb8c142bdf93df4e9aaccf967ed43dff906
https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/789198
2011-06-27 11:38:26 -04:00
Stefan Holmer
ba0822ba96 Adding support for error concealment in multi-threaded decoding
Also includes a couple of error concealment bug fixes:
- the segment_id wasn't properly initialized when missing
- when interpolating and no neighbors are found, set to zero
- clear the qcoef buffer when concealing an MB

Change-Id: Id79c876b41d78b559a2241e9cd0fd2cae6198f49
2011-06-27 09:03:06 -04:00
Adrian Grange
deca8cfc44 Fixed initialization of frame buffer ref counters
Only the first frame buffer ref counter was being initialized
because the index was fixed at 0 rather than using i.

Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4
2011-06-24 08:43:40 -07:00
Yunqing Wang
0d87098e08 Copy macroblock data to a buffer before encoding it
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.

Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
2011-06-23 13:54:02 -04:00
John Koleszar
db67dcba6a Revert "Reduce overshoot in 1 pass rate control"
This reverts commit 212f618373.

Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.

Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5
2011-06-23 11:52:12 -04:00
John Koleszar
259ea23297 Merge "Fix parallel install" 2011-06-23 08:18:10 -07:00
John Koleszar
ac998ec8d8 Merge changes I2807b5a1,I59b020c2
* changes:
  vpxenc: add rate histogram display
  vpxenc: add quantizer histogram display
2011-06-23 08:09:10 -07:00
John Koleszar
c96f8e238d vpxenc: add rate histogram display
Add the --rate-hist=n option, which displays a histogram with n
buckets for the rate over the --buf-sz window.

Change-Id: I2807b5a1525c7972e9ba40839b37e92b23ceffaf
2011-06-23 10:22:44 -04:00
John Koleszar
3fde9964ce vpxenc: add quantizer histogram display
Add the --q-hist=n option, which displays a histogram with n buckets for
the quantizer selected on each frame.

Change-Id: I59b020c26b0acae0b938685081d9932bd98df5c9
2011-06-23 10:22:43 -04:00
James Berry
2bd90c13a0 get/set reference buffer dimension check added
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.

Change-Id: I904982313ce9352474f80de842013dcd89f48685
2011-06-22 13:36:24 -04:00
Alexis Ballier
653e69e334 Fix parallel install
Require the destination to be present before trying to create the symlink.
See: http://bugs.gentoo.org/show_bug.cgi?id=323805

Change-Id: I14ed4a9792dedc289885a9a43bc5a86cb792206d
2011-06-22 12:53:07 -04:00
Yaowu Xu
76495617e0 Merge "adjusting the calculation of errorperbit" 2011-06-21 09:47:42 -07:00
Scott LaVarnway
55c3963c88 Merge "Improved vp8dx_decode_bool" 2011-06-21 07:45:51 -07:00
Yunqing Wang
109c20299c Merge "Remove unnecessary bounds checking in motion search" 2011-06-21 07:23:24 -07:00
Attila Nagy
6f23f24afe configuration, support disabling any subset of ARM arch
Useful for leaving out any version specific asm files.

Change-Id: I233514410eb9d7ca88d2d2c839673122c507fa99
2011-06-21 10:39:01 +03:00
Yaowu Xu
10ed60dc71 adjusting the calculation of errorperbit
RDMULT/RDDIV defines a bit worth of distortion in term of sum squared
difference. This has also been used as errorperbit in subpixel motion
search, where the distortions computed as variance of the difference.
The variance of differences is different from sum squared differences
by amount of DC squared. Typically, for inter predicted MBs, this
difference averages around 10% between the two distortion, so this patch
introduces a 110% constant in deriving errorperbit from RDMULT/RDDIV.

Test on CIF set shows small but positive gain on overall PSNR (.03%)
and SSIM (.07%), overall impact on average PSNR is 0.

Change-Id: I95425f922d037b4d96083064a10c7cdd4948ee62
2011-06-20 16:32:30 -07:00
Scott LaVarnway
67a1f98c2c Improved vp8dx_decode_bool
Relocated the vp8dx_bool_decoder_fill() call, allowing
the compiler to produce better assembly code.  Tests
showed a 1 - 2 % performance boost (x86 using gcc)
for the 720p clip used.

Change-Id: Ic5a4eefed8777e6eefa007d4f12dfc7e64482732
2011-06-20 14:44:16 -04:00
Taekhyun Kim
458fb8f491 utilize preload in ARMv6 MC/LPF/Copy routines
About 9~10% decoding perf improvement on non-Neon ARM cpus

Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837
2011-06-17 14:04:53 -07:00
Yunqing Wang
2cd1c2855e Remove unnecessary bounds checking in motion search
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.

Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
2011-06-17 14:19:51 -04:00
John Koleszar
a60fc419f5 Merge "Use SSE as BPRED distortion metric consistently" 2011-06-17 09:48:32 -07:00
Ronald S. Bultje
87fd66bb0e Assign boost to GF bit allocation if past frame had no ARF.
Modify the second-pass code to provide a full golden-frame (GF) bit
allocation boost if the past GF group (GFG) had no alt-ref frame (ARF),
even if the current GFG does contain and ARF.

This mostly has no effect on clips, since switching ARFs on/off between
GFGs is not very common. Has a positive effect on e.g. cheer (+0.45 SSIM
at 600kbps) and football (+0.25 SSIM at 600kbps), particularly at high
bitrates. Has a negative effect (-0.04 SSIM at 300kbps) at pamphlet,
which appears only marginally related to this patch, and crew (-0.1 SSIM
at 700kbps).

Change-Id: I2e32899638b59f857e26efeac18a82e0c0b77089
2011-06-16 13:01:27 -04:00
John Koleszar
eb645abeac Merge "Disable specialcase for last frames if the sequence contains ARFs." 2011-06-16 09:56:05 -07:00
John Koleszar
d9959e336e Merge "gen_msvs_proj: write boolean for Debug attribute" 2011-06-16 07:19:46 -07:00
James Zern
91b167202d gen_msvs_proj: write boolean for Debug attribute
Replace =1 with =true for yasm tool element. This aids in upgrading
e.g., vs9 project files to vs10.
build/x86-msvs/yasm.xml generated during conversion will require the
Separator attribute to be removed for the build to complete
successfully.

Change-Id: If75c4f9a925529740048882003e9d766c5ac4f0c
2011-06-15 14:56:16 -07:00
John Koleszar
5223016337 Merge "Remove redundant check for KEY_FRAME in multithreaded decoder" 2011-06-15 10:18:06 -07:00
John Koleszar
61599fb59f Use SSE as BPRED distortion metric consistently
The BPRED mode selection uses SSE as a distortion metric, but the early
breakout threshold being used was a variance value.

Change-Id: I42d4602fb9b548bf681a36445701fada5e73aff1
2011-06-15 10:53:37 -04:00
John Koleszar
1ade44b352 Merge "fix --disable-runtime-cpu-detect on x86" 2011-06-15 07:09:09 -07:00
Ronald S. Bultje
299193dd1c Disable specialcase for last frames if the sequence contains ARFs.
firstpass.c contains some rate adjustment code that assures that the
last few frames in a sequence abide by rate limits. If the second-to-
last group of frames contains an alt-ref frame (ARF), the last golden
frame (GF) is zero bytes, and we will thus spend a ridiculously high
number of bits on regular P-frames trying to hit the target rate. This
does slightly enhance the quality of these last few frames, but has
no perceptual value (other than hitting the target rate).

Disabling this code means we consistently (slightly) undershoot the
target rate and consequently do worse on the last few frames of a
clip, which is particularly noticeable for small clips. The quality-
per-bitrate is generally better, ~0.2% better overall on derf-set,
especially on clips such as garden, tennis, foreman at low bitrates.
Has a negative effect on hallmonitor at high bitrates.

Change-Id: I1d63452fef5fee4a0ad2fb2e9af4c9f2e0d86d23
2011-06-15 09:47:00 -04:00
Attila Nagy
e7e5a58d0c Guard vpx_config.h against multiple inclusions
Change-Id: Iabe2be73af2b92c53687755b31b77448fba385d2
2011-06-15 13:36:09 +03:00
Attila Nagy
c7e6aabbca Remove redundant check for KEY_FRAME in multithreaded decoder
For Intra blocks is enough to check ref_frame == INTRA_FRAME.

Change-Id: I3e2d3064c7642658a9e14011a4627de58878e366
2011-06-15 09:01:27 +03:00
Scott LaVarnway
7be5b6dae4 Merge "Populate bmi for B_PRED only" 2011-06-14 12:04:50 -07:00
Johann
92b0e544f3 fix --disable-runtime-cpu-detect on x86
Change-Id: Ib8e429152c9a8b6032be22b5faac802aa8224caa
2011-06-14 11:31:50 -04:00
Paul Wilkins
bf6b314d89 Merge "Fix RT only build" 2011-06-14 06:01:44 -07:00
Tero Rintaluoma
9909047461 Fix RT only build
Moved encode_intra function from firstpass.c to encodeintra.c to
prevent linking problem in real-time only build. Also changed name
of the function to vp8_encode_intra because it is not a static.

Change-Id: Ibf3c6c1de3152567347e5fbef47d1d39564620a5
2011-06-14 13:39:06 +03:00
Tero Rintaluoma
5405bd9793 Update -linux-rvct targets
- Updated -linux-rvct targets to support RVDS 4.0 and later.
- Changed optimization flag to -Otime because -O3 ruined performance
  for RVCT linux targets.
- Added support for --enable-small for RVCT
- RVCT created library should be able to link with GCC
- Supports building shared linux libraries

Change-Id: Ic62589950d86c3420fd4d908b8efb870806d1233
2011-06-14 11:29:35 +03:00
James Zern
532c30c83e fix corrupt frame leak
If setup_token_decoder reported an internal error the memory allocated
there would not be freed in the resulting call to _remove_decompressor.

Change-Id: Ib459de222d76b1910d6f449cdcd01663447dbdf6
2011-06-13 17:32:19 -07:00
Scott LaVarnway
223d1b54cf Populate bmi for B_PRED only
Small decode performance gain (~1%) on keyframes.  No
noticeable gains on encode.  Also changed pick_intra4x4mby_modes()
to read the above and left block modes for keyframes only.

Change-Id: I1f4885252f5b3e9caf04d4e01e643960f910aba5
2011-06-13 17:14:11 -04:00
Scott LaVarnway
e71a010646 Calc ref_frame_cost once per frame
instead of every macro block.

Change-Id: I2604e94c6b89e3a8457777e21c8c38406d55b165
2011-06-13 09:58:03 -04:00
Tero Rintaluoma
66533b1a8d Fix make clean for asm offset files
Automatically created assembly offset files added to CLEAN-OBJS list
for proper cleanup. This will fix following build error:
1) Build for the workstation
./conigure
make
make clean
2) Build for ARM platform
./configure --target=armv7-linux-gcc
make ==> this will fail because it uses old asm_*_offset.asm files

Change-Id: Id5275c470390ca81b8db086a15ad75af39b80703
2011-06-10 14:05:53 +03:00
John Koleszar
f3ba4c6b82 Merge "bug fix mode_info_context not initialized for error-resilient" 2011-06-09 13:39:47 -07:00
Yaowu Xu
361717d2be remove one set of 16x16 variance funcations
call to this set of functions are replaced by var16x16.

Change-Id: I5ff1effc6c1358ea06cda1517b88ec28ef551b0d
2011-06-09 11:23:05 -07:00
James Berry
45feea4cf0 bug fix mode_info_context not initialized for error-resilient
uninitialized xd->mode_info_context would crash
vpxenc for --error-resilient=1.

Change-Id: I31849e40281e3d65ab63257cfec5e93398997f0b
2011-06-09 12:46:31 -04:00
John Koleszar
af49c11250 Update keyframe activity in non-RD mode
Activity update is no longer dependent on being in RD mode, so update
it unconditionally.

Change-Id: Ib617a6fc210dfc045455e3e4467d7ee5e3d1fa0e
2011-06-09 12:05:31 -04:00
Johann
79327be6c7 use GCC inline magic
Better fix for #326. ICC happens to support the inline magic

Change-Id: Ic367eea608c88d89475cb7b05d73500d2a1bc42b
2011-06-08 16:19:37 -04:00
Johann
baa17db184 Merge "Revert "Use shared object files for ELF"" 2011-06-08 11:47:18 -07:00
Johann
abb7c2181e Revert "Use shared object files for ELF"
Broke RVCT. New magic coming for ICC. Stay tuned!

This reverts commit c73eb2ffff
2011-06-08 11:36:04 -07:00
John Koleszar
8767ac3bc7 Merge "vp8_pick_inter_mode: remove best_bmodes" 2011-06-08 10:59:30 -07:00
John Koleszar
9e4df2bcf5 Merge "vp8_pick_intra_mode: correct returned rate" 2011-06-08 10:58:36 -07:00
John Koleszar
254a7483e5 Merge "Move RD intra block mode selection to rdopt.c" 2011-06-08 10:51:50 -07:00
John Koleszar
001bd51ceb vp8_pick_inter_mode: remove best_bmodes
Since BPRED will be tested at most once, and SPLITMV is not enabled,
there's nothing to clobber the subblock modes, so there's no need to
save and restore them.

Change-Id: I7c3615b69190c10bd068a44df5488d6e8b85a364
2011-06-08 13:50:50 -04:00
Scott LaVarnway
dce64343d6 Merge "Removed unused function parameters" 2011-06-08 10:20:28 -07:00
John Koleszar
91907e0bf4 vp8_pick_intra_mode: correct returned rate
The returned rate was always the 4x4 rate, instead of the rate
matching the selected mode.

Change-Id: I51da31f80884f5e37f3bcc77d1047d31e612ded4
2011-06-08 13:19:12 -04:00
Scott LaVarnway
69d8d386ed Removed unused function parameters
Change-Id: Ib641c624faec28ad9eb99e2b5de51ae74bbcb2a2
2011-06-08 13:01:09 -04:00
Yaowu Xu
1fba1e38ea Adjust errorperbit according to RDMULT in activity masking
In activity masking, RDO constant RDMULT is adjusted on a per MB basis
adaptive to activity with the MB. errorperbit, which is defined as
RDMULT/RDDIV, is a constant used in motion estimation. Previously, in
activity masking, errorperbit is not changed even when RDMULT is changed.
This commit changed to adjust errorperbit according to the change in
RDMULT.

Test in cif set showed a very small but consistent gain by all quality
metrics (average, overall psnr and ssim) when activity masking is on.

Change-Id: I07ded3e852919ab76757691939fe435328273823
2011-06-08 09:45:47 -07:00
Yaowu Xu
5fafa2d524 Merge "Further activity masking changes:" 2011-06-08 09:30:31 -07:00
John Koleszar
96a42aaa2d Move RD intra block mode selection to rdopt.c
This change is analogous to I0b67dae1f8a74902378da7bdf565e39ab832dda7,
which made the move for the non-RD path.

Change-Id: If63fc1b0cd1eb7f932e710f83ff24d91454f8ed1
2011-06-08 12:05:05 -04:00
John Koleszar
e90d17d240 Move intra block mode selection to pickinter.c
This commit moves the intra block mode selection from encodeframe.c
to pickinter.c (in the non-RD case). This allowed pick_intra_mbuv_mode
and pick_intra4x4mby_modes to be made static, and is a step towards
refactoring intra mode selection in the main pickinter loop. Gave a
small perf increase (~0.5%).

Change-Id: I0b67dae1f8a74902378da7bdf565e39ab832dda7
2011-06-08 11:44:57 -04:00
Paul Wilkins
4e81a68af7 Further activity masking changes:
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target  rather
than a frame average, for altering rd and zbin.

Overall the SSIM performance is similar  to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%

Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
2011-06-08 16:03:37 +01:00
Yaowu Xu
7368dd4f8f Merge "remove redundant functions" 2011-06-07 16:36:37 -07:00
Yaowu Xu
59129afc05 Merge "adjust sad per bit constants" 2011-06-07 12:37:04 -07:00
Yaowu Xu
221e00eaa9 adjust sad per bit constants
While investigating the effect of DC values on SAD and SSE in motion
estimation, a side finding indicates the two table of constants need
be adjusted. The adjustment was done by multiplying old constants by
90% with rounding. Also absorb the 1/2 scaling constant into the two
tables. Refer to change Ifa285c3e for background of the 1/2 factor.

Cif set test showed a very small gain on all metric.

Change-Id: I04333527a823371175dd46cb04a817e5b9a8b752
2011-06-07 12:35:03 -07:00
John Koleszar
5c166470a5 Merge "Reduce overshoot in 1 pass rate control" 2011-06-07 12:30:37 -07:00
Scott LaVarnway
346358a5b7 Merge "Wrapped asserts in critical code with CONFIG_DEBUG" 2011-06-07 06:53:51 -07:00
Scott LaVarnway
afb84bb1cc Merge "Removed unused function vp8_treed_read_num" 2011-06-07 06:51:24 -07:00
John Koleszar
8c2ee4273c Merge "Use shared object files for ELF" 2011-06-07 06:44:32 -07:00
Scott LaVarnway
0e3bcc6f32 Wrapped asserts in critical code with CONFIG_DEBUG
Change-Id: I5b0aaca06f2e0f40588cb24fb0642b6865da8970
2011-06-07 09:34:47 -04:00
Scott LaVarnway
1374a4db3b Removed unused function vp8_treed_read_num
Change-Id: Id66e70540ee7345876f099139887c1843093907f
2011-06-07 09:32:51 -04:00
Yaowu Xu
d4700731ca remove redundant functions
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
    variance == sse - sum*sum/256

Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
2011-06-06 16:44:05 -07:00
Yunqing Wang
03973017a7 Remove hex search's variance calculation while in real-time mode
In real-time mode motion search, there is no need to calculate
variance. This change improved encoding speed by 1% ~ 2%(speed=-5).

Change-Id: I65b874901eb599ac38fe8cf9cad898c14138d431
2011-06-06 19:11:05 -04:00
Johann
04edde2b11 Merge "neon fast quantize block pair" 2011-06-06 13:42:58 -07:00
Johann
da8eb716e8 Merge "adds preload for armv6 encoder asm" 2011-06-06 13:32:13 -07:00
Johann
c73eb2ffff Use shared object files for ELF
Fixes #326

Change-Id: I5f2a4257430ef62f674190acefd43a0474821288
2011-06-06 16:10:31 -04:00
Scott LaVarnway
d1c0ba8f7a Merge "Removed unnecessary bmi motion vector stores." 2011-06-06 07:57:39 -07:00
John Koleszar
824e9410c6 Merge "Don't allow very short GF groups even when the GF is predicted from an ARF." 2011-06-06 07:02:29 -07:00
John Koleszar
212f618373 Reduce overshoot in 1 pass rate control
This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.

Tested on the CIF set over 200-500kbps using these settings:

  --buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
  --undershoot-pct=100

Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:

  --rt --cpu-used=-4

The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:

  One pass:
    baseline:   1.29715
    this patch: 1.23664

  Two pass:
    baseline:   1.32702
    this patch: 1.37824

This change had a positive effect on our quality metrics as well:

  One pass CBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -0.42 / 2.86 / 27.32
    Overall PSNR    -0.90 / 2.00 / 17.27
    SSIM            -0.05 / 3.95 / 37.46

  Two pass CBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -4.47 / 4.35 / 35.99
    Overall PSNR    -3.40 / 4.18 / 36.46
    SSIM            -4.56 / 6.98 / 53.67

  One pass VBR:
                    Min  / Mean / Max (pct)
    Average PSNR    -5.21 /  0.01 / 3.30
    Overall PSNR    -8.10 / -0.38 / 1.21
    SSIM            -7.38 / -0.11 / 3.17
    (note: most values here were close to the mean, there were a few
     outliers on files that were very sensitive to golden frame size)

  Two pass VBR:
                    Min  / Mean / Max (pct)
    Average PSNR    0.00 / 0.00 / 0.00
    Overall PSNR    0.00 / 0.00 / 0.00
    SSIM            0.00 / 0.00 / 0.00

Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.

Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.

Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716
2011-06-03 16:38:11 -04:00
Scott LaVarnway
f1d6cc79e4 Removed unnecessary bmi motion vector stores.
left_block_mv and above_block_mv will return the MB
motion vector for non SPLITMV macro blocks.

Change-Id: I58dbd7833b4fdcd44b6b72e98ec732c93c2ce4f4
2011-06-03 13:09:46 -04:00
Scott LaVarnway
8c5b73de2a Merge "Removed B_MODE_INFO" 2011-06-03 08:32:30 -07:00
Yunqing Wang
e5c236c210 Adjust bounds checking for hex search in real-time mode
Currently, hex search couldn't guarantee the motion vector(MV)
found is within the limit of maximum MV. Therefore, very large
motion vectors resulted from big motion in the video could cause
encoding artifacts. This change adjusted hex search bounds
checking to make sure the resulted motion vector won't go out
of the range. James Berry, thank you for finding the bug.

Change-Id: If2c55edd9019e72444ad9b4b8688969eef610c55
2011-06-03 08:53:42 -04:00
Scott LaVarnway
773768ae27 Removed B_MODE_INFO
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.

Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
2011-06-02 13:46:41 -04:00
Ronald S. Bultje
9f002bee53 Don't allow very short GF groups even when the GF is predicted from an ARF.
This is basically a slightly modified version of the previous patch,
and it has a moderately positive effect (SSIM/PSNR both +0.08% avg
on derf-set). Most clips show no change, except waterfall/coastguard,
each ~ +0.8% SSIM/PSNR. You can see similar effects in other clips
by shortening their length to terminate at a very short last group
of frames.

Change-Id: I7a70de99ca1f9fe6a8b6ca7a6e30e8a4b64383e4
2011-06-02 09:14:51 -07:00
Yaowu Xu
4ce6928d5b Merge "further clean up of errorperbit and sadperbit" 2011-06-02 08:58:03 -07:00
Yaowu Xu
5b2fb32961 further clean up of errorperbit and sadperbit
this commit makes the usage errorperbit and sadperbit consistent for
encoding modes and passes. Removed all different magic weight factors
associated with errorperbit. Now 1/2 is used for both sadperbit16 and
sadperbit4, the /2 operation is merged into initializations of the 2
variables.

Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
psnr and ssim respectively.

Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
2011-06-01 14:44:06 -07:00
John Koleszar
4101b5c5ed Merge "Bugfix in vp8dx_set_reference" 2011-06-01 13:57:23 -07:00
Henrik Lundin
69ba6bd142 Bugfix in vp8dx_set_reference
The fb_idx_ref_cnt book-keeping was in error. Added an assert to
prevent future errors in the reference count vector. Also fixed a
pointer syntax error.

Change-Id: I563081090c78702d82199e407df4ecc93da6f349
2011-06-01 21:41:12 +02:00
John Koleszar
5610970fe9 Merge "Fix code under #if CONFIG_INTERNAL_STATS." 2011-06-01 11:14:17 -07:00
Ronald S. Bultje
34ba18760f Fix code under #if CONFIG_INTERNAL_STATS.
Change-Id: Iccbd78d91c3071b16fb3b2911523a22092652ecd
2011-06-01 11:10:13 -07:00
Yaowu Xu
50916c6a7d remove some magic weights associated with sad_per_bit
sad_per_bit has been used for a number of motion vector search routines
with different magic weights: 1, 1/2 and 1/4. This commit remove these
magic numbers and use 1/2 for all motion search routines, also reformat
a number of source code lines to within 80 column limit.

Test on cif set shows overall effect is neutral on all metrics. <=0.01%

Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
2011-06-01 10:10:44 -07:00
Tero Rintaluoma
61f0c090df neon fast quantize block pair
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
 - Additional 3-6% speedup compared to neon optimized fast
   quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)

Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
2011-06-01 10:48:05 +03:00
Scott LaVarnway
9e4f76c154 Merge "vp8_pick_inter_mode code cleanup" 2011-05-31 12:31:46 -07:00
Scott LaVarnway
1a5a1903ea vp8_pick_inter_mode code cleanup
Small code cleanups before attempting to reduce the size
of bmi found in BLOCKD.

Change-Id: Ie9c14adb53afd847716a75bcce067d0e6c04f225
2011-05-31 14:24:42 -04:00
John Koleszar
0a72f568ec Initialize first_time_stamp_ever
Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.

Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621
2011-05-31 12:37:45 -04:00
Tero Rintaluoma
5305e79eae adds preload for armv6 encoder asm
Added preload instructions to armv6 encoder optimizations.
About 5% average speed-up on Tegra2 for VGA@30fps sequence.

Change-Id: I41d74737720fb71ce7a316f07555357822f3347e
2011-05-30 11:10:03 +03:00
John Koleszar
4a4ade6dc8 Merge "bug fix check frame buffer index before copy" 2011-05-27 12:35:06 -07:00
James Berry
8795b52512 bug fix check frame buffer index before copy
in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy.  If two frames
share the same buffer the flags will already be
set correctly.

Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e
2011-05-27 14:59:29 -04:00
Yunqing Wang
4fb5ce6a92 Merge "Use hex search for realtime mode speed>4" 2011-05-27 11:12:50 -07:00
Yunqing Wang
4d052bdd91 Use hex search for realtime mode speed>4
Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.

Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c
2011-05-27 14:05:02 -04:00
Scott LaVarnway
ba420f1097 Merge "Broken EC after MODE_INFO size reduction" 2011-05-27 07:52:04 -07:00
Yunqing Wang
5a8cbb8955 Merge "Remove unused code" 2011-05-27 07:25:25 -07:00
Yunqing Wang
2dc24635ec Remove unused code
Hex search is not called in rdopt.c

Change-Id: I67347f03e13684147a7c77fb9e9147e440bb5e8e
2011-05-27 10:20:49 -04:00
Scott LaVarnway
4f586f7bd0 Broken EC after MODE_INFO size reduction
This patch fixes the compiler errors and the seg fault
when running decode_with_partial_drops.

Change-Id: I7c75369e2fef81d53b790d5dabc327218216838b
2011-05-26 15:13:00 -04:00
John Koleszar
1fe5070b76 Merge "Do not copy data between encoder reference buffers." 2011-05-26 09:58:26 -07:00
Yaowu Xu
9a248f1593 Merge "fix the mix use of errorperbit and sadperbit" 2011-05-26 09:39:41 -07:00
Scott LaVarnway
40b850b458 Merge "Use int_mv instead of MV in vp8_mv_cont" 2011-05-26 07:01:38 -07:00
Yaowu Xu
d8c525b8b1 fix the mix use of errorperbit and sadperbit
error_per_bit and sad_per_bit were designed as estimates of a bit worth
of sum squared error and sum absolute difference respectively. Under
this assumption, error_per_bit should be used in combination with 2nd
order errors (variance or sum squared error) while sad_per_bit should
be used in combination with 1st order SADs in motion estimation. There
were a few places where sad_per_bit has been misused with variances,
this commit changes to use error_per_bit for those places, also changes
parameter names to properly indicate which constant is being used.

On cif set, the change has a universal gain by all metrics: 0.13% by
average/overall psnr and 0.1% by ssim.

Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
2011-05-25 16:48:10 -07:00
Yunqing Wang
13b56eeb7a Merge " Use var8x8 instead of get8x8var in VP8_UVSSE" 2011-05-25 11:35:42 -07:00
Yunqing Wang
f299d628f3 Merge "Return sse value in vp8_variance SSE2 functions" 2011-05-25 11:31:07 -07:00
Yaowu Xu
22c05c0575 remove code not in use
Change-Id: I6e5e86235d341cce3b02abda26dbeb71940ed955
2011-05-25 09:46:37 -07:00
Yunqing Wang
b6679879b8 Return sse value in vp8_variance SSE2 functions
Minor modification.

Change-Id: I09511d38fd1451d5c4106a48acdb3f766ce59cb7
2011-05-25 11:55:41 -04:00
Attila Nagy
a615c40499 Use var8x8 instead of get8x8var in VP8_UVSSE
'sum' returned by get8x8var is not used and var8x8 has optimizations
  for more platforms.

Change-Id: I4a907fb1a05f285669fb0b95dc71d42182c980f6
2011-05-25 12:54:34 +03:00
Yunqing Wang
d75eb73653 Fix a bug happening while encoding at profile=3
While profile=3, there is no sub-pixel search. Distortion and SSE
have to calculated using get_inter_mbpred_error().

Change-Id: Ifb36e17eef7750af93efa7d0e2870142ef540184
2011-05-24 16:28:23 -04:00
Scott LaVarnway
a39321f37e Use int_mv instead of MV in vp8_mv_cont
Less operations.

Change-Id: Ibb9cd5ae66b8c7c681c9a654d551c8729c31c3ae
2011-05-24 16:01:12 -04:00
Scott LaVarnway
cfab2caee1 Removed unused variable warnings
Change-Id: I6e5e921f03dc15a72da89a457848d519647677a3
2011-05-24 15:17:03 -04:00
Scott LaVarnway
b5278f38b0 Merge "MODE_INFO size reduction" 2011-05-24 12:08:24 -07:00
Scott LaVarnway
e11f21af9a MODE_INFO size reduction
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions.  The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.

Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
2011-05-24 13:24:52 -04:00
John Koleszar
fbea372817 Merge "Fixing bug in VP8_SET_REFERENCE decoder control command" 2011-05-24 05:57:44 -07:00
Yunqing Wang
69aad3a720 Merge "Rewrite hex search function" 2011-05-24 05:26:16 -07:00
Henrik Lundin
a126cd1760 Fixing bug in VP8_SET_REFERENCE decoder control command
In vp8dx_set_reference, the new reference image is written to an
unused reference frame buffer.

Change-Id: I9e4f2cef5a011094bb7ce7b2719cbfe096a773e8
2011-05-24 09:03:43 +02:00
Yaowu Xu
99fb568e67 Merge "use get8x8var directly for non-subpixel motion case in VP8_UVSSE" 2011-05-23 14:49:56 -07:00
Yunqing Wang
7838f4cfff Rewrite hex search function
Reduced some bound checks in hex search function.

Change-Id: Ie5f73a6c227590341c960a74dc508cff80f8aa06
2011-05-23 16:18:52 -04:00
Yaowu Xu
ab2dfd22f3 use get8x8var directly for non-subpixel motion case in VP8_UVSSE
VP8_UVSSE mistakenly used subpixvar8x8 to calculate SSE for non-subpixl
motion cases.

Change-Id: I4a5398bb9ef39c211039f6af4540546d4972e6a9
2011-05-23 09:11:28 -07:00
John Koleszar
ad6fe4a88c Merge "bug fix active_worst_quality set below active_best_quality" 2011-05-20 11:23:10 -07:00
John Koleszar
8196cc85f8 Merge "cleanup: collect twopass variables" 2011-05-20 11:20:44 -07:00
Johann
6d82d2d22e Merge "Fixed iwalsh_neon build problems with RVDS4.1" 2011-05-20 07:51:11 -07:00
Yaowu Xu
1fbc81a970 Merge "revise two function definitions with less parameters" 2011-05-20 07:45:42 -07:00
John Koleszar
a0c11928db Merge "Remove unused members of VP8_COMP" 2011-05-20 07:39:03 -07:00
Yaowu Xu
a4c69e9a0f revise two function definitions with less parameters
Change-Id: Ia96e5bf915e4d3c0ac9c1795114bd9e5dd07327a
2011-05-19 19:06:03 -07:00
Yaowu Xu
1f3f18443d Merge "disable trellis optimization for first pass" 2011-05-19 17:25:31 -07:00
Yaowu Xu
d5b8f7860f disable trellis optimization for first pass
also remove 2 #defines and 1 function declaration that are not in use.

Change-Id: I8f743d0e3dd9ebf1de24a8b0c30ff09f29b00c53
2011-05-19 17:22:14 -07:00
James Berry
caa1b28be3 bug fix active_worst_quality set below active_best_quality
fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.

Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039
2011-05-19 18:10:31 -04:00
John Koleszar
63cb1a7ce0 cleanup: collect twopass variables
This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.

This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.

Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d
2011-05-19 17:26:09 -04:00
Scott LaVarnway
dba79821f0 Merge "Using partition_info instead of blockd info for splitmv" 2011-05-19 13:22:59 -07:00
John Koleszar
048497720c Remove unused members of VP8_COMP
Various members that were either completely unreferenced or written
and not read.

Change-Id: Ie41ebac0ff0364a76f287586e4fe09a68907806e
2011-05-19 15:49:09 -04:00
Scott LaVarnway
99b9757685 Using partition_info instead of blockd info for splitmv
The partition_info struct contains info just for SPLITMV,
so it should be used instead of BLOCKD.  Eventually, I want
to reduce the size of B_MODE_INFO struct found in BLOCKD, so
this is the first step toward that goal.
Also, since SPLITMV is not supported in vp8_pick_inter_mode(),
the unnecessary mem copies and checks were removed.  For rt
encodes, this gave a slight performance improvement.

Change-Id: I5585c98fa9d5acbde1c7e0f452a01d9ecc080574
2011-05-19 15:03:36 -04:00
Scott LaVarnway
914f7c36d7 Merge "Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3." 2011-05-19 11:22:01 -07:00
John Koleszar
c684d5e5f2 Merge "changed configure option name to reduce confusion" 2011-05-19 11:17:08 -07:00
John Koleszar
ff39958cee Merge "Make activity masking functions static" 2011-05-19 11:12:18 -07:00
John Koleszar
21ca4c4d5d Merge "Fix segv without --enable-error-concealment" 2011-05-19 10:58:24 -07:00
John Koleszar
7def902261 Fix segv without --enable-error-concealment
Missed wrapping one function call in #if CONFIG_ERROR_CONCEALMENT.

Change-Id: I5746b1e6e4531670dbed1130467331fe309bdcae
2011-05-19 13:57:45 -04:00
John Koleszar
e3081b2502 Merge "Adding error-concealment to the decoder." 2011-05-19 10:48:58 -07:00
Stefan Holmer
d04f852368 Adding error-concealment to the decoder.
The error-concealer is plugged in after any motion vectors have been
decoded. It tries to estimate any missing motion vectors from the
motion vectors of the previous frame. Intra blocks with missing
residual are replaced with inter blocks with estimated motion vectors.

This feature was developed in a separate sandbox
(sandbox/holmer/error-concealment).

Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412
2011-05-19 13:46:33 -04:00
John Koleszar
a84177b432 Make activity masking functions static
These don't need extern linkage.

Change-Id: I21220ada926380a75ff654f24df84376ccc49323
2011-05-19 11:14:13 -04:00
John Koleszar
87254e0b7b Move quantizer init functions to quantize.c
Group related functions together.

Change-Id: I92fd779225b75a7204650f1decb713142c655d71
2011-05-19 11:07:41 -04:00
Attila Nagy
f96d56c4aa Fixed iwalsh_neon build problems with RVDS4.1
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.

Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
2011-05-19 10:27:26 +03:00
Yunqing Wang
00a1e2f8e4 Merge "Modify MVcount in pick_inter_mode to eliminate calling of vp8_find_near_mvs" 2011-05-18 12:53:27 -07:00
Yunqing Wang
9c62f94129 Fix a bug in vp8_clamp_mv function
Scott fixed the bug in MV clamping function in encoder, which
could cause artifacts.

Change-Id: Id05f2794c43c31cdd45e66179c8811f3ee452cb9
2011-05-18 09:52:56 -04:00
Yunqing Wang
f62b33f140 Modify MVcount in pick_inter_mode to eliminate calling of vp8_find_near_mvs
Moved MVcount modification in pick_inter_mode, and eliminated
calling of vp8_find_near_mvs.

Change-Id: Icd47448a1dfc8fdf526f86757d0e5a7f218cb5e8
2011-05-17 10:59:42 -04:00
John Koleszar
eafdc5e10a Merge "Improve framerate adaptation" 2011-05-13 11:18:42 -07:00
Yaowu Xu
5608c14020 Merge "adjusting rd constant slightly by ~10%" 2011-05-13 09:28:26 -07:00
Paul Wilkins
0e86235265 Merge "Restructure of activity masking code." 2011-05-13 09:23:50 -07:00
Paul Wilkins
ff52bf3691 Restructure of activity masking code.
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages

It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.

Change-Id: Id39b19eace37574dc429f25aae810c203709629b
2011-05-13 10:37:50 +01:00
John Koleszar
5ed116e220 Improve framerate adaptation
This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)

Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e
2011-05-12 15:07:50 -04:00
Scott LaVarnway
71a7501bcf Removed mv_bits_sadcost
This sad cost is being generated but never used.

Change-Id: I562eebdcb792b743770954feca365b5b37491ecd
2011-05-12 11:20:41 -04:00
Scott LaVarnway
6b25501bf1 Using int_mv instead of MV
The compiler produces better assembly when using int_mv
for assignments.  The compiler shifts and ors the two 16bit
values when assigning MV.

Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
2011-05-12 11:08:16 -04:00
Yunqing Wang
6ed81fa5b3 Merge "Modification and issue fix in full-pixel refining search" 2011-05-12 07:20:44 -07:00
Yunqing Wang
b4da1f83e6 Modification and issue fix in full-pixel refining search
Further modification and wrong implementation fix which caused
refining_search and refining_searchx4 result mismatching.

Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
2011-05-12 10:18:40 -04:00
Yaowu Xu
bd9d890605 adjusting rd constant slightly by ~10%
This is to reflect the RD improvement in the encoder. The change has a
small positive impact on quality (0.25% by VPXSSIM and 0.05% by PSNR)

Change-Id: Ic66ffc19b10870645088c0624c85556f009fd210
2011-05-11 23:32:06 -07:00
Yaowu Xu
ba6f60dba7 Merge "remove a variable no longer in use" 2011-05-10 20:20:59 -07:00
Yaowu Xu
1bcf4e66bb Merge "fix a bug related to gf_active_flags in multi-threaded encoder" 2011-05-10 19:59:52 -07:00
Yaowu Xu
f7cf439b34 remove a variable no longer in use
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.

Change-Id: I7420522ec7723f38cf77033466c25afb405d52ae
2011-05-10 19:57:51 -07:00
John Koleszar
814532a33c Merge "Use stdint.h for VS2010" 2011-05-10 18:53:04 -07:00
Johann
df2023a6cb set up Global Offset Table in recon
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.

Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
2011-05-10 15:58:56 -04:00
Yunqing Wang
c7a56f677d Merge "Use diamond search to replace full search in full-pixel refining search" 2011-05-10 06:59:38 -07:00
Yunqing Wang
cb7b1fb144 Use diamond search to replace full search in full-pixel refining search
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.

Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.

Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
2011-05-09 14:07:06 -04:00
Johann
a7d4d3c550 clean up unused variable warnings
Change-Id: I9467d7a50eac32d8e8f3a2f26db818e47c93c94b
2011-05-09 12:56:20 -04:00
Yaowu Xu
89c6017cc0 fix a bug related to gf_active_flags in multi-threaded encoder
Paul pointed out that the pointer to the gf_active_flags is not being
properly incremented in multithreaded encoder. This commit fixes the
issue by making sure the gf_active_ptr points to the starting of next
group of mb rows.

Change-Id: I3246e657d23beabb614dfb880733a68a5fd7e34c
2011-05-06 09:00:44 -07:00
John Koleszar
5c756005aa Merge "Don't override active_worst_quality in 2 pass" 2011-05-06 08:59:05 -07:00
Johann
52490354f3 Merge "neon fast quantizer updated" 2011-05-06 08:54:14 -07:00
John Koleszar
abc9958c52 Don't override active_worst_quality in 2 pass
Commit db5057c introduced a bug in that the active_worst_quality
selected by the 2 pass rate controller was being overridden for key
frames, causing a severe quality loss.

Change-Id: I4865a6fbe3e94e9b4fb9271c7dd68b455d7b371d
2011-05-06 11:48:53 -04:00
John Koleszar
4ead98fa84 Use stdint.h for VS2010
VS2010 has included stdint.h, but not inttypes.h. Prefer the compiler's
version of these types. Fixes issue 327.

Change-Id: Ica71600e06b8e94e3bbb4f12988b4a9817d5e5e4
2011-05-06 08:02:39 -04:00
Tero Rintaluoma
33fa7c4ebe neon fast quantizer updated
vp8_fast_quantize_b_neon function updated and further optimized.
 - match current C implementation of fast quantizer
 - updated to use asm_enc_offsets for structure members
 - updated ads2gas scripts to handle alignment issues

Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
2011-05-06 08:59:52 +03:00
Aron Rosenberg
eeb8117303 Fix semaphore emulation on Windows
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.

This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.

Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
2011-05-06 00:13:59 -04:00
Yunqing Wang
eb16f00cf2 Fix rare hang in multi-thread encoder on Windows
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.

Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
2011-05-05 10:42:29 -04:00
Johann
ca5c1b17a2 Merge "Loopfilter NEON: Use VMOV for constant vectors instead of VLD." 2011-05-05 06:16:21 -07:00
Yunqing Wang
aeb86d615c Merge "Runtime detection of available processor cores." 2011-05-05 04:59:54 -07:00
Attila Nagy
a6aa389d2f Loopfilter NEON: Use VMOV for constant vectors instead of VLD.
Change-Id: I562b6e01c32bb51d00f3b95faf757fc7dc29a3a3
2011-05-04 11:29:23 +03:00
Yunqing Wang
3fbade23a2 Merge "Modify HEX search" 2011-05-03 11:59:32 -07:00
Yunqing Wang
04ec930abc Modify HEX search
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.

Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).

Will continue to improve it.

Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
2011-05-03 14:26:33 -04:00
Yaowu Xu
e9465daee3 Merge "change to use fast ssim code for internal ssim calculations" 2011-05-03 11:20:52 -07:00
Yaowu Xu
6c565fada0 change to use fast ssim code for internal ssim calculations
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged

Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
2011-05-03 08:36:17 -07:00
Ronald S. Bultje
bbf890fe27 build: change LDFLAGS/CFLAGS ordering.
Always use CFLAGS/LDFLAGS that point to headers and libvpx.a inside our
build tree before ones from the environment, which could reference
headers or libs outside the build tree.

This fixes issue 307.

Change-Id: I34d176b8c21098f6da5ea71f0147d3c49283cc45
2011-05-02 13:56:41 -04:00
John Koleszar
c09d8c1419 Merge "Fix documentation typos" 2011-05-02 06:50:22 -07:00
John Koleszar
a66d8d33dd Fix compile error with --enable-postproc-visualizer
Typo.

Change-Id: I9cc6a4587c3d93c9f0da5e101d376741fc9622a4
2011-05-02 09:28:37 -04:00
Thijs Vermeir
8942f70cdf Fix documentation typos
Change-Id: I97124670926433bf1593c91660d8b8f8482ea9ce
2011-04-30 09:34:59 +02:00
Ronald S. Bultje
5a23352c03 Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Change-Id: I658a1df7d825f820573cb2d11ad402f9d2791035
2011-04-29 11:52:09 -07:00
Yaowu Xu
57ad189129 changed configure option name to reduce confusion
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35

Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
2011-04-29 09:39:05 -07:00
Yunqing Wang
dfa9e2c5ea Merge "Use insertion sort instead of quick sort" 2011-04-29 08:27:58 -07:00
Scott LaVarnway
1b2abc5f49 Merge "Consolidated build inter predictors" 2011-04-29 07:13:49 -07:00
James Berry
f10732554b bug fix removed inline from recon_wrapper_sse2.c
removed inline from recon_wrapper_sse2.c to build
for visual stuido

Change-Id: I74a3482950448e2cdb30e9cd7087145b440d8a22
2011-04-28 15:12:00 -04:00
James Berry
5db296dd70 bug fix 32 bit matches 64 bit
included vpx_config.h in vpx_encoder.c
to properly define FLOATING_POINT_INIT()

Change-Id: Ie518bf5c087622658e37fca90aa4ddfe79d053f6
2011-04-28 14:11:32 -04:00
Scott LaVarnway
219ba87a93 Merge "Use psadbw to get the sum of bytes in a line." 2011-04-28 07:58:20 -07:00
Scott LaVarnway
ccd6f7ed77 Consolidated build inter predictors
Code cleanup.

Change-Id: Ic8b0167851116c64ddf08e8a3d302fb09ab61146
2011-04-28 10:53:59 -04:00
Ronald S. Bultje
1e7ded69cf Use psadbw to get the sum of bytes in a line.
Thanks Jason for pointing that out on #vp8. ;-).

Change-Id: I5330a753e752a8704b78a409597472628e0b26a5
2011-04-27 13:49:21 -07:00
Scott LaVarnway
2e102855f4 Removed unused code in reconinter
The skip flag is never set by the encoder for SPLITMV.

Change-Id: I5ae6457edb3a1193cb5b05a6d61772c13b1dc506
2011-04-27 15:25:32 -04:00
John Koleszar
085fb4b737 Merge "SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}()." 2011-04-27 12:02:55 -07:00
Ronald S. Bultje
1083fe4999 SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
decoding

before
10.425
10.432
10.423
=10.426

after:
10.405
10.416
10.398
=10.406, 0.2% faster

encoding

before
14.252
14.331
14.250
14.223
14.241
14.220
14.221
=14.248

after
14.095
14.090
14.085
14.095
14.064
14.081
14.089
=14.086, 1.1% faster

Change-Id: I483d3d8f0deda8ad434cea76e16028380722aee2
2011-04-27 11:31:27 -07:00
Fritz Koenig
00fdb135a7 vpxenc: remove duplicate --fps from vpxenc usage message
Fixes issue #323

Change-Id: I41c297df37afe186a8425ed2e2a95032069dcb9a
2011-04-27 11:27:59 -07:00
Yunqing Wang
5abafcc381 Use insertion sort instead of quick sort
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.

Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
2011-04-27 13:53:28 -04:00
John Koleszar
4226f0ce64 vpxdec: test for frame corruption
This change simply exercises the VP8D_GET_FRAME_CORRUPTED control,
outputting a warning message at the end if the bit was set for any
frames. Should never produce any output for good input.

Change-Id: Idaf6ba8f53660f47763cd563fa1485938580a37d
2011-04-27 12:04:48 -04:00
John Koleszar
64355ecad3 Merge "Speed up VP8DX_BOOL_DECODER_FILL" 2011-04-27 09:03:45 -07:00
John Koleszar
f8ffecb176 Merge "Update VP8DX_BOOL_DECODER_FILL to better detect EOS" 2011-04-27 09:03:24 -07:00
John Koleszar
5e1fd41357 Speed up VP8DX_BOOL_DECODER_FILL
The end-of-buffer check is hoisted out of the inner loop. Gives
about 0.5% improvement on x86_64.

Change-Id: I8e3ed08af7d33468c5c749af36c2dfa19677f971
2011-04-27 10:25:03 -04:00
John Koleszar
9594370e0c Update VP8DX_BOOL_DECODER_FILL to better detect EOS
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.

Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
2011-04-27 10:24:39 -04:00
John Koleszar
db5057c742 Refactor calc_iframe_target_size
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.

Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
2011-04-26 16:55:35 -04:00
John Koleszar
81d2206ff8 Move pick_frame_size() to ratectrl.c
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.

Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
2011-04-26 16:49:54 -04:00
Scott LaVarnway
0da77a840b Merge "Test vector mismatch fix" 2011-04-26 10:12:37 -07:00
Scott LaVarnway
7a2b9c50a3 Test vector mismatch fix
Fixed test vector mismatch that was introduced
in the "Removed dc_diff from MB_MODE_INFO"
(Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931)

Change-Id: I98fa509b418e757b5cdc4baa71202f4168dc14ec
2011-04-26 09:37:19 -04:00
Johann
d5c46bdfc0 Merge "remove simpler_lpf" 2011-04-25 14:51:07 -07:00
Johann
01527e743f remove simpler_lpf
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers

stop tracking the option in two places. use filter_type exclusively

Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
2011-04-25 17:37:41 -04:00
John Koleszar
fd6da3b2e7 Fix duplicate vp8_compute_frame_size_bounds
Likely introduced by a bad automatic merge from gerrit.

Change-Id: I0c6dd6ec18809cf9492f524d283fa4a3a8f4088b
2011-04-25 14:30:57 -04:00
John Koleszar
1f32b1489c Merge "Remove unused functions" 2011-04-25 11:05:00 -07:00
John Koleszar
47bc1c7013 Remove unused functions
Remove estimate_min_frame_size() and calc_low_ss_err(), as they are
never referenced.

Change-Id: I3293363c14ef70b79c4678ca27aa65b345077726
2011-04-25 13:54:23 -04:00
John Koleszar
cfbfd39de8 Merge "Change rc undershoot/overshoot semantics" 2011-04-25 10:49:32 -07:00
John Koleszar
ef86bad0d1 Merge "Stereo 3D format support for vpxenc" 2011-04-25 10:48:44 -07:00
John Koleszar
76557e34d2 Merge "Limit size of initial keyframe in one-pass." 2011-04-25 10:48:13 -07:00
John Koleszar
d9f898ab6d Merge "Add rc_max_intra_bitrate_pct control" 2011-04-25 10:47:57 -07:00
John Koleszar
454cbc96b7 Limit size of initial keyframe in one-pass.
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.

This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.

Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
2011-04-25 13:47:20 -04:00
John Koleszar
aa926fbd27 Add rc_max_intra_bitrate_pct control
Adds a control to limit the maximum size of a keyframe, as a function of
the per-frame bitrate. See this thread[1] for more detailed discussion:

[1]: http://groups.google.com/a/webmproject.org/group/codec-devel/browse_thread/thread/271b944a5e47ca38

Change-Id: I7337707642eb8041d1e593efc2edfdf66db02a94
2011-04-25 13:47:14 -04:00
John Koleszar
2089b2cee5 Merge "bug fix possible keyframe context divide by zero" 2011-04-25 09:35:12 -07:00
James Berry
8d5ce819dd bug fix possible keyframe context divide by zero
vp8_adjust_key_frame_context() divides by
estimate_keyframe_frequency() which can
return 0 in the case where --kf-max-dist=0.

Change-Id: Idfc59653478a0073187cd2aa420e98a321103daa
2011-04-25 12:16:36 -04:00
Alok Ahuja
72c76ca256 Stereo 3D format support for vpxenc
Create a new input parameter to allow specifying
the packed frame stereo 3d format. A default value
of mono will be written in the absence of user
specified input

Change-Id: I576d9952ab5d7e2076fbf1b282016a9a1baaa103
2011-04-25 07:21:51 -07:00
Johann
aeca599087 Merge "keep values in registers during quantization" 2011-04-25 06:52:38 -07:00
Scott LaVarnway
c36b6d4d01 Merge "Removed unnecessary frame type checks" 2011-04-25 06:45:43 -07:00
Scott LaVarnway
5b67329747 Merge "Removed dc_diff from MB_MODE_INFO" 2011-04-25 06:45:32 -07:00
Yaowu Xu
373dcec57a Merge "make two compiler options explicit for Visual Studio projects" 2011-04-22 14:08:08 -07:00
Ronald S. Bultje
496bcbb0de Fix overflow in temporal_filter_apply_sse2().
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.

Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
2011-04-22 10:00:38 -04:00
John Koleszar
73c3d32705 Merge "Remove unused kf rate variables" 2011-04-21 16:54:14 -07:00
Adrian Grange
d2a6eb4b1e Corrected format specifiers in debug print statements
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.

Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
2011-04-21 15:45:57 -07:00
Yaowu Xu
ddb6edd831 make two compiler options explicit for Visual Studio projects
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".

Change-Id: I0bf8343d9ca195851332b91ec69c69ee4e31ce2a
2011-04-21 13:27:42 -07:00
Johann
508ae1b3d5 keep values in registers during quantization
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.

Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
2011-04-21 15:47:55 -04:00
Scott LaVarnway
6f6cd3abb9 Removed unnecessary frame type checks
ref_frame is set to INTRA_FRAME for keyframes.  The B_PRED
mode is only used in intra frames.

Change-Id: I9bac8bec7c736300d47994f3cb570329edf11ec0
2011-04-21 14:59:42 -04:00
Scott LaVarnway
3698c1f620 Removed dc_diff from MB_MODE_INFO
The dc_diff flag is used to skip loopfiltering.  Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.

Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
2011-04-21 14:38:36 -04:00
Scott LaVarnway
7a49accd0b Removed force_no_skip
force_no_skip is always set to zero.

Change-Id: I89b61c5e0bee34627a9c07c05f3517e1db76af77
2011-04-20 15:45:12 -04:00
Scott LaVarnway
09c933ea80 Removed redundant checks of the mode_info_context flags
Code cleanup.  The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.

Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
2011-04-20 14:06:40 -04:00
Attila Nagy
43464e94ed Do not copy data between encoder reference buffers.
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.

Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
2011-04-20 15:26:55 +03:00
John Koleszar
ad6a8ca58b Remove unused kf rate variables
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.

Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.

Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
2011-04-19 16:14:57 -04:00
Johann
4a2b684ef4 modify SAVE_XMM for potential 64bit use
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.

Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
2011-04-19 10:42:45 -04:00
Johann
a9b465c5c9 Merge "Add save/restore xmm registers in x86 assembly code" 2011-04-19 06:32:10 -07:00
Johann
c7cfde42a9 Add save/restore xmm registers in x86 assembly code
Went through the code and fixed it. Verified on Windows.

Where possible, remove dependencies on xmm[67]

Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.

Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
2011-04-18 16:30:38 -04:00
Yunqing Wang
48438d6016 Merge "Use sub-pixel search's SSE in mode selection" 2011-04-18 13:20:04 -07:00
Yunqing Wang
b8f0b59985 Use sub-pixel search's SSE in mode selection
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.

Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
2011-04-18 16:12:28 -04:00
Yunqing Wang
d5069b5af0 Merge "Handle long delay between video frames in multi-thread decoder(issue 312)" 2011-04-18 10:11:41 -07:00
Johann
cd103a5721 Merge "store quant_shift as an unsigned char" 2011-04-18 10:03:40 -07:00
Yaowu Xu
05d9421e8b Merge "Add spin-wait pause intrinsic for Windows x64 platform." 2011-04-18 09:53:26 -07:00
Yaowu Xu
c619f6cb0f Merge "fixed an overflow in ssim calculation" 2011-04-18 07:44:34 -07:00
Scott LaVarnway
e1a8b6c8d5 Removed unused timers
Change-Id: I209803b9dbed2b2f6d02258fd7a3963a6645f4ab
2011-04-18 09:09:57 -04:00
John Koleszar
8fcb801d15 Merge "added -fomit-frame-pointer flag for gcc builds" 2011-04-18 06:07:57 -07:00
Yunqing Wang
8ba58951e9 Handle long delay between video frames in multi-thread decoder(issue 312)
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.

This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."

Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().

Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
2011-04-15 17:27:26 -04:00
Johann
d889035fe6 Merge "remove dead code, add missing RESTORE_XMM" 2011-04-15 13:32:54 -07:00
Scott LaVarnway
9409e38050 added -fomit-frame-pointer flag for gcc builds
According to the docs, this should have been enabled, but
the disassembled output shows otherwise.  This improved
the encode/decode performance.

Change-Id: I45ad7e6d299b89ac3166d7ef7da75b74994344c6
2011-04-15 15:59:21 -04:00
Johann
f64f425a50 remove executable bit
source files are not executable

Change-Id: Id2c7294695a22217468426423979f68f02d82340
2011-04-15 13:43:24 -04:00
Adrian Grange
0d2abe3084 Merge "Fix usage of value returned by vp8_pick_intra4x4mby_modes" 2011-04-15 08:37:19 -07:00
Yunqing Wang
1312a7a2e2 Merge "Reduce unnecessary distortion computation" 2011-04-15 08:17:03 -07:00
Johann
487c0299c9 remove dead code, add missing RESTORE_XMM
vp8_filter_block1d16_h4_ssse3 was never called

because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.

Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
2011-04-15 10:11:53 -04:00
John Koleszar
a3399291ad Fix off-by-one in copy_and_extend_plane
Should only copy h lines, not h+1.

Change-Id: I802a85686635900459c6dc79596189033e5298d8
2011-04-15 08:44:39 -04:00
Yunqing Wang
918fb5487e Reduce unnecessary distortion computation
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.

Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
2011-04-14 15:53:33 -04:00
John Koleszar
63f15987a5 Merge "Refactor lookahead ring buffer" 2011-04-14 12:35:01 -07:00
Fritz Koenig
e749ae510f Merge "Use consistent delimiters." 2011-04-14 11:56:18 -07:00
Adrian Grange
8608de1c6f Fix usage of value returned by vp8_pick_intra4x4mby_modes
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.

Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
2011-04-14 10:50:00 -07:00
Johann
ab48305fb6 Merge "update configure for ios sdk 4.3" 2011-04-14 08:55:22 -07:00
Joshua Bleecher Snyder
5e7a3bb69a update configure for ios sdk 4.3
update for the latest version of the ios sdk. adding
usr/lib/system fixes a missing libcache.dylib issue

make isysroot path more DRY

Change-Id: Ib748ef3dac3cac2e4848fbffa1e9a0112eac826b
2011-04-14 11:22:33 -04:00
Fritz Koenig
33cefd6f6e Use consistent delimiters.
opsnr.stt file was using \t for delimiters on everything
except between VPXSSIM and Time.

Change-Id: I6284c4e40c05ff642bf4b0170dca062c279a42df
2011-04-13 15:06:17 -07:00
Adrian Grange
8861174624 Fixed use of early breakout in vp8_pick_intra4x4mby_modes
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.

Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
2011-04-13 12:56:46 -07:00
John Koleszar
88841f1059 Refactor lookahead ring buffer
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.

Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
2011-04-13 14:26:45 -04:00
Johann
70f30aa95d store quant_shift as an unsigned char
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant

only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize

Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
2011-04-13 13:50:12 -04:00
John Koleszar
c99f9d7abf Change rc undershoot/overshoot semantics
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.

This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.

Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
2011-04-12 20:49:33 -04:00
John Koleszar
538f110407 Merge "Bugfix for error accumulator stats" 2011-04-12 06:59:00 -07:00
John Koleszar
e689a27d62 Bugfix for error accumulator stats
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.

Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
2011-04-12 08:47:33 -04:00
John Koleszar
fd09009227 Merge "Fix encoder range check for frame width and height" 2011-04-12 05:34:12 -07:00
Attila Nagy
1aadcedcfb Fix encoder range check for frame width and height
14 bits available in the bistream => valid range [1..16383]
Removed unused local vars.

Change-Id: Icf3385e47a9fa13af70053129c2248671f285583
2011-04-12 15:07:37 +03:00
Yunqing Wang
4fd81a99f8 Set cpu_used range to [-16, 16] in real-time mode
Remove encoding speed limitation in real-time mode.

Change-Id: Ib5e35d8bb522b2a25f3e4ad5cfe2788ebebb3617
2011-04-11 15:55:04 -04:00
Yunqing Wang
d1abe62d1c Define RDCOST only once
Clean up the code.

Change-Id: I7db048efa4d972b528d553a7921bc45979621129
2011-04-11 11:53:56 -04:00
John Koleszar
a9ce3e3834 Remove unused files
Change-Id: I36ca3f2f4620358033da34daf764f0b388dacd08
2011-04-11 10:34:40 -04:00
Yunqing Wang
4b43167ad1 Fix input MV for full search
Input MV needs to be modified to full-pixel precision.

Change-Id: Ic5d78e41bf27077e325024332b9fe89f76c44f0c
2011-04-08 16:29:41 -04:00
Johann Koenig
6e156a4cd7 Merge "use asm_offsets with vp8_fast_quantize_b_sse3" 2011-04-08 10:05:47 -07:00
John Koleszar
921a32a306 Merge "Error accumulator stats bug." 2011-04-08 08:20:32 -07:00
Paul Wilkins
de4e9e3b44 Error accumulator stats bug.
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.

These are only "currently" used in a limited way for RT compress
key frame detection.

Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
2011-04-08 14:21:36 +01:00
Jim Bankoski
d4cdb683a4 fixed an overflow in ssim calculation
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.

Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
2011-04-07 14:25:25 -07:00
Johann Koenig
08702002e8 use asm_offsets with vp8_fast_quantize_b_sse3
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.

Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
2011-04-07 16:40:05 -04:00
James Berry
aec5487cdd Use correct 32 bit comparisons for SAD breakout.
Rax updated to eax to avoid uninitialized memory
usage.

Change-Id: Iedb953f104329ede2a786fc648a47f1be2f3798a
2011-04-07 15:08:03 -04:00
Johann
2de858b9fc Merge "use asm_offsets with vp8_fast_quantize_b_sse2" 2011-04-06 10:53:55 -07:00
Yunqing Wang
9e9f61a317 Merge "Minor modification" 2011-04-06 06:12:13 -07:00
Yunqing Wang
02423b2e92 Minor modification
A small change.

Change-Id: I2e7726e58370a95d0319361f4f6ad231138d1328
2011-04-06 09:08:47 -04:00
Johann
c32e0ecc59 use asm_offsets with vp8_fast_quantize_b_sse2
on the same order as the regular quantize change: ~2%

Change-Id: I5c9eec18e89ae7345dd96945cb740e6f349cee86
2011-04-04 16:23:29 -04:00
Scott LaVarnway
f212a98ee7 Fixed unused variable warnings for firstpass.c
Change-Id: I8378a9a541ade2f098359a7b20fa08e6c1596d80
2011-04-04 14:18:31 -04:00
John Koleszar
91036996ac Merge "Slightly simplify vp8_decode_mb_tokens." 2011-04-04 08:58:25 -07:00
Johann
610dd90288 Merge "tweak vp8_regular_quantize_b_sse2" 2011-04-04 08:56:25 -07:00
Gaute Strokkenes
15f03c2f13 Slightly simplify vp8_decode_mb_tokens.
Change-Id: I0058ba7dcfc50a3374b712197639ac337f8726be
2011-04-04 16:47:22 +01:00
Yunqing Wang
f5c0d95e8c Merge "Use full-pixel MV in mvsadcost calculation" 2011-04-04 08:40:51 -07:00
John Koleszar
af1acc851b Merge "support obj_int_extract on cygwin" 2011-04-04 08:29:50 -07:00
Yunqing Wang
3d6815817c Use full-pixel MV in mvsadcost calculation
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.

Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
2011-04-01 16:41:58 -04:00
Johann
fd7040d2b6 support obj_int_extract on cygwin
cygwin doesn't support _sopen. drop down to the lowest common
denominator and merge main for all platforms. this also opens the door
for supporting multiple object formats with a single binary.

Change-Id: I7cd45091639d447434e6d5db2e19cfc9988f8630
2011-04-01 13:23:44 -04:00
John Koleszar
82315be75d Merge "vpxenc: die on realloc failures" 2011-04-01 07:55:55 -07:00
Johann
8520b5c785 tweak vp8_regular_quantize_b_sse2
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi

gains of about the same order as the obj_int_extract implementation: ~2%

Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
2011-04-01 09:58:23 -04:00
Johann
ba11e24d47 Merge "Wrapper function removed from vp8_subtract_b_neon function call" 2011-04-01 05:47:21 -07:00
Tero Rintaluoma
cec76a36d6 Wrapper function removed from vp8_subtract_b_neon function call
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
 - vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
   that contains all needed address calculations
 - unnecessary file encodemb_arm.c removed
 - consistent with ARMv6 optimized version

Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
2011-04-01 10:06:44 +03:00
Johann
9d138379a2 Merge "ARMv6 optimized subtract functions" 2011-03-31 08:40:10 -07:00
John Koleszar
f56b9ee92e Merge changes I4e32a8fb,Ic6a9d4c5
* changes:
  Generate a vpx.pc file for pkg-config.
  Export the version string as a makefile variable.
2011-03-31 06:21:11 -07:00
Attila Nagy
297b27655e Runtime detection of available processor cores.
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.

Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.

Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
2011-03-31 10:23:01 +03:00
Ralph Giles
607f8420f3 Generate a vpx.pc file for pkg-config.
Rules are added to libs.mk to generate a vpx.pc, which is
installed as pkgconfig/vpx.pc under the target library directory.
This also requires the install path prefix be exported directly
in config.mk.

Some systems use a tool called pkg-config to query information
about intalled libraries or other resources, based on database
files provided by the packages themselves at install time.

Providing such a file for libvpx simplifies integration with
other build systems, and provides an easy avenue for developers
to test against their own builds of the library.

Change-Id: I4e32a8fbb53fc331aa95eb207c63dd70a76d18ed
2011-03-30 20:56:16 -07:00
Ralph Giles
53e9987b4d Export the version string as a makefile variable.
The configure script exports the major/minor/patch version
numbers, but didn't make the full version string available
to Makefile recipes and rules, the way it is available to
C code from vpx_version.h.

Change-Id: Ic6a9d4c574a6ea66a50c928f4eedeb91d7668eb5
2011-03-30 20:53:43 -07:00
Attila Nagy
7d335868df Fix: lpf semaphore was signaled in single threaded run
After picking filter level, post the loopfilter semaphore
just when multiple threads are in use.

Change-Id: If7bfb64601d906adef703f454dafc25e978b93c6
2011-03-30 15:55:29 +03:00
John Koleszar
26b6a3b088 vpxenc: die on realloc failures
Identified as a possible cause of issue #308, the code was silently
ignoring realloc failures, which would lead to corruption, memory
leaks, and likely a crash. The best we can do in this case is die
gracefully.

Change-Id: Ie5f6a853d367015be5b9712bd742778f3baeefd9
2011-03-30 06:37:02 -04:00
Johann
0e43668546 Merge "Half pixel variance further optimized for ARMv6" 2011-03-29 12:14:54 -07:00
Yunqing Wang
534ea700bd Merge "Fix a crash while enabling shared (--enable-shared)" 2011-03-29 09:04:22 -07:00
Yunqing Wang
b843aa4eda Fix a crash while enabling shared (--enable-shared)
Fixed a bug in SSSE3 sub-pixel filter functions.

Change-Id: I2e2126652970eb78307ffcefcace1efd5966fb0a
2011-03-29 11:31:06 -04:00
Johann
f0c22a3f33 use GLOBAL correctly on 32bit shared libraries
http://code.google.com/p/webm/issues/detail?id=309

Change-Id: I6fce9e2f74bc09a9f258df7f91ab599812324e8c
2011-03-29 11:27:03 -04:00
John Koleszar
49c31dc2b4 Merge "configure: enable unused variable warnings" 2011-03-29 07:38:04 -07:00
Tero Rintaluoma
6fdc9aa79f ARMv6 optimized subtract functions
Adds following ARMv6 optimized functions to encoder:
  - vp8_subtract_b_armv6
  - vp8_subtract_mby_armv6
  - vp8_subtract_mbuv_armv6

Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.

Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
2011-03-29 16:52:00 +03:00
Johann
4be062bbc3 add asm_enc_offsets.c for all targets
now that we need asm_enc_offsets.c for x86 and arm and it is
harmless to build it for other targets, add it unconditionally

Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
2011-03-28 10:43:47 -04:00
Tero Rintaluoma
f5e433464b Half pixel variance further optimized for ARMv6
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.

Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
2011-03-28 09:51:51 +03:00
Johann
beaafefcf1 Merge "use asm_offsets with vp8_regular_quantize_b_sse2" 2011-03-24 11:06:36 -07:00
Johann
8edaf6e2f2 use asm_offsets with vp8_regular_quantize_b_sse2
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems

when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9

Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
2011-03-24 13:34:48 -04:00
Johann
4cde2ab765 Merge "ARMv6 optimized fdct4x4" 2011-03-23 07:52:51 -07:00
John Koleszar
edfc93aeba Merge "Allow specifying --end-usage by enum name" 2011-03-21 12:29:11 -07:00
John Koleszar
577910b464 Merge "vpx_codec_dec_init: check that the iface is a decoder" 2011-03-21 09:12:58 -07:00
John Koleszar
2fced87e75 vpx_codec_dec_init: check that the iface is a decoder
Make sure the given interface is actually a decoder interface before
initializing it.

Change-Id: Ie48d737f2956cc2f0891666de5ea87251e96bc49
2011-03-21 12:12:14 -04:00
Yunqing Wang
73065b67e4 Merge "Fix multithreaded encoding for 1 MB wide frame" 2011-03-21 07:41:31 -07:00
John Koleszar
2cbd962088 Remove unused vp8_get4x4sse_cs_mmx declaration
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.

Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
2011-03-21 07:53:53 -04:00
John Koleszar
769c74c0ac Merge "Increase static linkage, remove unused functions" 2011-03-21 04:51:51 -07:00
John Koleszar
500fec2d5f Allow specifying --end-usage by enum name
Map an enum to the --end-usage values, so you can specify
--end-usage=cq instead of --end-usage=2. The numerical values still
work for historical scripts, etc, but this is more user friendly.

Change-Id: I445ecd9638f801f5924a71eabf449bee293cdd34
2011-03-21 07:50:42 -04:00
Tero Rintaluoma
a61785b6a1 ARMv6 optimized fdct4x4
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
  - No interlocks in Cortex-A8 pipeline
  - One interlock cycle in ARM11 pipeline
  - About 2.16 times faster than current C-code compiled with -O3

Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
2011-03-21 13:33:45 +02:00
Attila Nagy
bfe803bda3 Fix multithreaded encoding for 1 MB wide frame
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.

http://code.google.com/p/webm/issues/detail?id=302

Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
2011-03-18 12:35:30 +02:00
John Koleszar
429dc676b1 Increase static linkage, remove unused functions
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.

These symbols were identified by:

  $ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
    | sort | grep '^ *1 '

Change-Id: I59609f58ab65312012c047036ae1e0634f795779
2011-03-17 20:53:47 -04:00
Ralph Giles
185557344a Set bounds from the array when iterating mmaps.
The mmap allocation code in vp8_dx_iface.c was inconsistent.
The static array vp8_mem_req_segs defines two descriptors,
but only the first is real. The second is a sentinel and
isn't actually allocated, so vpx_codec_alg_priv is declared
with mmaps[NELEMENTS(vp8_mem_req_segs)-1]. Some functions
use this reduced upper bound when iterating though the mmap
array, but these two functions did not.

Instead, this commit calls NELEMENTS(...->mmaps) to directly
query the bounds of the dereferenced array.

This fixes an array-bounds warning from gcc 4.6 on
vp8_xma_set_mmap.

Change-Id: I918e2721b401d134c1a9764c978912bdb3188be1
2011-03-17 14:52:05 -07:00
Ralph Giles
de5182eef3 Remove commented-out VP6 code from vp8_finalize_mmaps
Change-Id: I48642c380353043bed96026f56de5908fcee270a
2011-03-17 14:51:31 -07:00
John Koleszar
8431e768c9 Merge "Fix "used uninitialized" warning in vp8_pack_bitstream()" 2011-03-17 14:25:04 -07:00
John Koleszar
de50520a8c apple: include proper mach primatives
Fixes implicit declaration warning for 'mach_task_self'. This change
is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d,
which fixed this issue for the decoder but not the encoder.

Change-Id: I9df033e81f9520c4f975b7a7cf6c643d12e87c96
2011-03-16 13:59:32 -04:00
Attila Nagy
346b3e7ce9 Remove echoing in obj_int_extract rule
Change-Id: I9965170b40e2f32e9d84895c33a529b0d7dacdc1
2011-03-16 12:56:52 +02:00
Attila Nagy
71bcd9f1af Add vp8_variance8x8_armv6 and vp8_sub_pixel_variance8x8_armv6 functions
Change-Id: I08edaffc62514907fa5e90e1689269e467c857f5
2011-03-15 15:50:44 +02:00
Gaute Strokkenes
6795e256c1 Avoid misspelling "dependent".
Change-Id: Ib0c280e1fcfd977e11e4390807b2c8077a87500c
2011-03-15 12:58:29 +00:00
John Koleszar
8c48c943e7 Merge "Fix an unused variable warning." 2011-03-14 14:13:53 -07:00
Ralph Giles
aa4a90c880 Improve grammar in a comment.
Change-Id: I18bfda6d420626f2718e096e338c1d0bf0ba029d
2011-03-14 13:41:50 -07:00
Johann
2ec0cfbe99 obj_int_extract: win64 does not prefix symbols
obj_int_extract was unconditionally skipping the first character in the
symbol. make sure it's actually an '_' first

Change-Id: Icfe527eb8a0028faeabaa1dcedf8cd8f51c92754
2011-03-14 16:11:02 -04:00
Johann
d0ec28b3d3 Merge "Add vp8_mse16x16_armv6 function" 2011-03-14 12:47:42 -07:00
Attila Nagy
e54dcfe88d Add vp8_mse16x16_armv6 function
Change-Id: I77e9f2f521a71089228f96e2db72524189364ffb
2011-03-14 14:38:31 +02:00
Rafael Ávila de Espíndola
52f6e28e9e Fix build with xcode4 and simplify GLOBAL.
Without this change I get link errors in firefox's libxul. It looks
like the linker expect a particular pattern for getting the GOT. This
patch changes webm to use the same pattern used by the compiler.

Change-Id: Iea8c2e134ad45c1dc7d221ff885a8429bfa4e057
2011-03-12 10:45:22 -05:00
Johann
3788b3564c Merge "Move build_intra_predictors_mby to RTCD framework" 2011-03-11 10:23:48 -08:00
John Koleszar
27972d2c1d Move build_intra_predictors_mby to RTCD framework
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.

Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
2011-03-11 13:04:50 -05:00
Johann
5c60a646f3 Merge "ARMv6 optimized quantization" 2011-03-11 08:29:00 -08:00
John Koleszar
75051c8b59 Merge "Only enable ssim_opt.asm on X86_64" 2011-03-11 08:28:05 -08:00
John Koleszar
5db0eeea21 Only enable ssim_opt.asm on X86_64
Fix compiling on 32 bit x86.

Change-Id: I6210573e1d9287ac49acbe3d7e5181e309316107
2011-03-11 11:27:08 -05:00
Paul Wilkins
6e73748492 Clean up of vp8_init_config()
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.

Change-Id: I937c00167cc039b3aa3f645f29c319d58ae8d3ee
2011-03-11 11:06:51 -05:00
John Koleszar
170b87390e Merge "1 Pass CQ and VBR bug fixes" 2011-03-11 08:06:09 -08:00
Paul Wilkins
2ae91fbef0 1 Pass CQ and VBR bug fixes
Issue 291 highlighted  the fact that CQ mode was not working
as expected in 1 pass mode,

This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.

For some clips (particularly short clips), the resulting
improvement is dramatic.

Change-Id: Ieefd6c6e4776eb8f1b0550dbfdfb72f86b33c960
2011-03-11 10:59:34 -05:00
John Koleszar
e34e417d94 Merge "Fix incorrect macroblock counts in twopass rate control" 2011-03-11 06:06:04 -08:00
Yunqing Wang
3c9dd6c3ef Merge "Align SAD output array to be 16-byte aligned" 2011-03-11 05:56:02 -08:00
John Koleszar
c5c5dcd0be Merge "vp8cx - psnr converted to call assemblerized sse" 2011-03-11 05:54:00 -08:00
John Koleszar
29c46b64a2 Merge "vp8cx- alternate ssim function with optimizations" 2011-03-11 05:53:41 -08:00
Jim Bankoski
3dc382294b vp8cx - psnr converted to call assemblerized sse
Change-Id: Ie388d4618c44b131f96b9fe526618b457f020dfa
2011-03-11 08:51:22 -05:00
Jim Bankoski
3f6f7289aa vp8cx- alternate ssim function with optimizations
Change-Id: I91921b0a90dbaddc7010380b038955be347964b3
2011-03-11 08:51:21 -05:00
Yunqing Wang
b2aa401776 Align SAD output array to be 16-byte aligned
Use aligned store.

Change-Id: Icab4c0c53da811d0c52bb7e8134927f249ba2499
2011-03-11 08:24:23 -05:00
Yunqing Wang
76ec21928c Merge "Encoder loopfilter running in its own thread" 2011-03-11 04:55:05 -08:00
Attila Nagy
9c836daf65 Fix "used uninitialized" warning in vp8_pack_bitstream()
Change-Id: Iadcbdba717439f47a2c24e65fd69a3a1464174b5
2011-03-11 12:36:28 +02:00
Attila Nagy
3ae2465788 Encoder loopfilter running in its own thread
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.

Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application  and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.

Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
2011-03-11 10:52:51 +02:00
Tero Rintaluoma
7ab08e1fee ARMv6 optimized quantization
Adds new ARMv6 optimized function vp8_fast_quantize_b_armv6
to the encoder.

Change-Id: I40277ec8f82e8a6cbc453cf295a0cc9b2504b21e
2011-03-11 10:48:42 +02:00
Johann
128d2c23b3 obj_int_extract for Visual Studio
Enable extraction of assembly offsets from compiled examples in MSVS.
This will allow us to remove some stub functions from x86 assembly since
we will be able to reliably determine structure offsets at compile time.

see ARM code for examples:
vp8/encoder/arm/armv5te/
vpx_scale/arm/neon/

Change-Id: I1852dc6b56ede0bf1dddb5552196222a7c6a902f
2011-03-10 18:49:54 -05:00
Adrian Grange
6daacdb785 Added missing format specifier in print statement
Printout of firstpass stats for frame had one fewer
format specifiers than arguments.

Change-Id: I5a42c85aa79c471e1a70afd75e24a91546b7a1cd
2011-03-10 12:43:49 -08:00
Adrian Grange
ed40ff9e2d Removed firstpass motion map
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.

For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.

This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.

Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.

Change-Id: I53e07d236f5ce09a6f0c54e7c4ffbb490fb870f6
2011-03-10 11:32:48 -08:00
James Berry
f3e9e2a0f8 Fix incorrect macroblock counts in twopass rate control
The previous calculation of macroblock count (w*h)/256
is not correct when the width/height are not multiples of
16. Use the precalculated macroblock count from
cpi->common instead. This manifested itself as a divide
by zero when the number of pixels was less than 256.
num_mbs updated in estimate_max_q, estimate_q,
 estimate_kf_group_q, and estimate_cq

Change-Id: I92ff98587864c801b1ee5485cfead964673a9973
2011-03-10 13:33:06 -05:00
Yunqing Wang
a0306ea660 Merge "Add vp8_sub_pixel_variance16x8_ssse3 function" 2011-03-09 12:26:37 -08:00
John Koleszar
c5a049babd Merge branch 'bali'
Change-Id: Icf18b4981afb12ef255fca431d4ba45860dd22c9
2011-03-09 14:11:54 -05:00
John Koleszar
5c24071504 Add missing filter.h to build system
Missing file causes 'make dist' to not include a complete copy of the
source.

Change-Id: I3f55aeb5a86d0e81234e4e4588cb8086ba4cfc4a
2011-03-09 13:43:31 -05:00
Johann
43baf7ff21 Merge "fix obj_int_extract for MinGW" 2011-03-09 09:44:49 -08:00
Yunqing Wang
7b8e7f0f3a Add vp8_sub_pixel_variance16x8_ssse3 function
Added SSSE3 function

Change-Id: I8c304c92458618d93fda3a2f62bd09ccb63e75ad
2011-03-09 12:33:21 -05:00
Yunqing Wang
4561109a69 Remove unused functions
Removed some unused functions

Change-Id: Ifdfc27453e53cfc75997b38492901d193a16b245
2011-03-09 10:45:03 -05:00
Yunqing Wang
7966dd5287 Merge "Improve SSE2 half-pixel filter funtions" 2011-03-09 07:23:06 -08:00
John Koleszar
fa836faede Merge "Configuration updates:Making a clear distinction between Init and Change" 2011-03-09 05:07:11 -08:00
Ralph Giles
56efffdcd1 Fix an unused variable warning.
Move the update of the loopfilter info to the same block where it
is used. GCC 4.5 is not able trace the initialization of the local
filter_info across the other calls between the two conditionals on
pbi->common and issues an uninitialized variable warning.

Change-Id: Ie4487b3714a096b3fb21608f6b0c74e745e3c6fc
2011-03-08 14:56:15 -08:00
Johann
fb037ec05b fix obj_int_extract for MinGW
failed to find headers in the source directory

output to stdout instead of a hardcoded file

MinGW doesn't support _sopen_s

_fstat catches non-existant files

Change-Id: I24e0aacc6f6f26e6bcfc25f9ee7821aa3c8cc7e7
2011-03-08 17:44:52 -05:00
Yunqing Wang
419f638910 Improve SSE2 half-pixel filter funtions
Rewrote these functions to process 16 pixels once instead of 8.

Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
2011-03-08 16:25:06 -05:00
Johann
95adf3df77 Merge "64bit mach-o support" 2011-03-08 12:29:32 -08:00
Yunqing Wang
859abd6b5d Merge "Add zero offset checking in SSE2 sub-pixel filter function" 2011-03-08 12:26:58 -08:00
Yunqing Wang
8432a1729f Add zero offset checking in SSE2 sub-pixel filter function
Skip filter at zero offset.

Change-Id: I95fc7e211869bc0ab5bcfb7ab2e3259d1c0ccf38
2011-03-08 15:22:07 -05:00
Yunqing Wang
e8f7b0f7f5 Merge "Write SSSE3 sub-pixel filter function" 2011-03-08 10:58:30 -08:00
Yunqing Wang
244e2e1451 Write SSSE3 sub-pixel filter function
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
   during motion search.
This change gave encoder 1%~3% performance gain.

Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
2011-03-08 13:29:01 -05:00
Johann
5091e01ea1 64bit mach-o support
enable parsing 64bit mach-o files (OS X)

also fixes --enable-debug issue!

Change-Id: I250ee69745cd2365e3e63264f9365cd58fbb6678
2011-03-08 11:42:30 -05:00
Johann
ddd260eb62 64bit elf support
enable parsing 64bit elf files

Change-Id: I7981f4769cf1b822f288fe2e32166254e4394bab
2011-03-08 10:49:35 -05:00
Ralph Giles
e6948bf0f9 Fix a multi-line format-string warning.
GCC 4.5 and 4.6 both issue a warning about the multi-line format
string introduced in bc9c30a0, which also changed the whitespace
in the associated stt file by line-wrapping the long format string.

Instead, use multiple string constants, which the compiler will
concatenate. This maintains the original formatting, but remains
legible within the standard line length.

Change-Id: I27c9f92d46be82d408105a3a4091f145f677e00e
2011-03-08 07:14:12 -08:00
Paul Wilkins
de87c420ef Corrected minor typos.
Change-Id: Icc9f12bd1e1bdaf51256dc8a90d08aa9be89ef34
2011-03-08 14:46:22 +00:00
Paul Wilkins
0eccee4378 Merge changes I00c3e823,If8bca004
* changes:
  Improved key frame detection.
  Improved KF insertion after fades to still.
2011-03-08 06:40:11 -08:00
John Koleszar
5d1d9911cb correct zbin boost for splitmv mode
Disable zbin boost in SPLITMV mode as intended. Was incorrectly looking
at vp8_ref_frame_order instead of vp8_mode_order when comparing against
SPLITMV. This condition should have always been false, as SPLITMV is
not in the range of valid reference frames.

Change-Id: I0408cc7595eff68f00efef6d008e79f5b60d14bf
2011-03-07 20:58:37 -05:00
John Koleszar
1016b856d1 Merge "Fix format-string warning" 2011-03-07 13:25:28 -08:00
Ralph Giles
fe9a604b1e Fix format-string warning
Cast size_t to (unsigned long) and print it with the %lu format
string, which is more portable than C99's explict %zu for size_t.

This truncates on Windows x64 but otherwise works on 32 and 64 bit
platforms. In practice the stats file is unlikely to be so large.

Change-Id: I0432b3acf85fc6ba4ad50640942e1ca4614b21cb
2011-03-07 13:21:39 -08:00
Paul Wilkins
bc9c30a003 Improved key frame detection.
In some cases where clips have been encoded with
borders (eg. some wide-screen content where there is a
border top and bottom and slide shows containing portrait
format photographs (border left and right)) key frames were
not being correctly detected.

The new code looks to measure cases where a portion of
the image can be coded equally easily using intra or inter
modes and where the resulting error score is also very low.
These "neutral" areas are then discounted in the key frame
detection code.

Change-Id: I00c3e8230772b8213cdc08020e1990cf83b780d8
2011-03-07 15:58:07 +00:00
Paul Wilkins
9fc8cb39aa Improved KF insertion after fades to still.
This code extends what was previously done for GFs, to pick
cases where insertion of a key frame after a fade (or other
transition or complex motion)  followed by a still section, will
be beneficial and will reduce the number of forced key frames.

Change-Id: If8bca00457f0d5f83dc3318a587f61c17d90f135
2011-03-07 15:11:09 +00:00
Aron Rosenberg
8e87d58712 Add spin-wait pause intrinsic for Windows x64 platform.
Change-Id: I7504370c67a3c551627c6bb7e67c65f83d88b78e
2011-03-04 14:49:50 -08:00
John Koleszar
0491c2cfc8 Update CHANGELOG for v0.9.6 (Bali) release
Change-Id: I7d1e7db1866d829f6d4c6638d1c20e99959cc9a3
2011-03-04 14:32:24 -05:00
John Koleszar
3fae3283e6 Update AUTHORS
Change-Id: I784ea2b9fabbec1e99d02e97209981ff1b18ac82
2011-03-04 11:12:06 -05:00
John Koleszar
d05c4d8841 Update .mailmap
Add mappings for Tom Finegan, Tero Rintaluoma

Change-Id: I014ad5bb7c8eb8261808d98ec0d4f77a8e7c3f35
2011-03-04 11:11:15 -05:00
Johann
e38c1680d6 Merge "examples: use function to get iface pointers" 2011-03-04 06:01:46 -08:00
Johann
77ed11c506 Merge "change CFLAGS for 64 bit icc builds" 2011-03-04 05:59:31 -08:00
John Koleszar
4a742e5c79 cosmetic: clean up comments for new vp8dx controls
Rename the common control id enum vp8_{dec,com}_control_id,
move VP8_DECODER_CTRL_ID_START to common, wrap long lines.

Change-Id: I659abc62f10aa389d496f7f43950775db0ef2f9f
2011-03-04 08:51:39 -05:00
John Koleszar
27c04aaa67 Merge "clean up msvs project generation" 2011-03-04 05:44:54 -08:00
John Koleszar
0bc31f1887 Merge "Fixing divide by zero" 2011-03-04 05:40:33 -08:00
John Koleszar
fb37eda3e2 Merge "Fix drastic undershoot in long form content" 2011-03-04 05:39:40 -08:00
John Koleszar
05d75b4353 Merge "documentation: minor updates to vp8 (en|de)coder" 2011-03-04 05:38:26 -08:00
John Koleszar
eed2ce58e3 Merge "Fix counter of fixed keyframe distance" 2011-03-04 05:28:38 -08:00
Mikhal Shemer
84f7f20985 Configuration updates:Making a clear distinction between Init and Change
Change-Id: I7b2fb326e1aabc08b032177a7b914a5b8bb7376f
2011-03-03 10:35:09 -08:00
Mikhal Shemer
1de99a2a81 Fixing divide by zero
Change-Id: I9d8a98a2f7ed1e3116d0bae35164618c41998bac
2011-03-03 10:33:36 -08:00
John Koleszar
36be4f7f06 Fix drastic undershoot in long form content
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.

This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.

This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.

This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.

Change-Id: Ie5cc1a7b516c578a76c3a50c892a6f04a11621fe
2011-03-02 22:52:27 -05:00
Johann
a1cfcb413d clean up msvs project generation
add visual studio 9 to --help

remove cpp, cxx, hpp, hxx files from filter

add the ability to target project names. this will be necessary to
enable obj_int_extract

Change-Id: I407583320d8b67a0df40c07221838c42678792f7
2011-03-02 09:34:34 -05:00
Johann
6f5189c044 Merge "ARMv6 optimized half pixel variance calculations" 2011-03-02 05:48:46 -08:00
John Koleszar
06ce0d8830 change CFLAGS for 64 bit icc builds
AMD64 only implies SSE2, not SSE3. There aren't any known cases where
icc was generating SSE3 instructions since all the vectorizable code
is already in handwritten asm, so this fix is included mostly for
correctness. Fixes issue #259.

Change-Id: I993335a4740b68b559035305fb52ca725a6beaff
2011-02-28 20:16:14 -05:00
John Koleszar
987ac89403 examples: use function to get iface pointers
MSVC can't pass the address of global variables in a DLL correctly
across DLL boundaries. This patch allows linking the examples to
a libvpx dll build. Fixes issue #268.

Change-Id: I1c52d076cfc68efb3efdfba019f12d53c5019f58
2011-02-28 20:06:59 -05:00
Yunqing Wang
cfaee9f7c6 Merge "Add prefetch before variance calculation" 2011-02-28 11:42:28 -08:00
Scott LaVarnway
3e6d476ac3 Merge "Avoid double copying of key frames into alt and golden buffer" 2011-02-28 10:16:33 -08:00
Yunqing Wang
d96ba65a23 Add prefetch before variance calculation
This improved encoding performance by 0.5% (good, speed 1) to
1.5% (good, speed 5).

Change-Id: I843d72a0d68a90b5f694adf770943e4a4618f50e
2011-02-28 11:25:55 -05:00
John Koleszar
4decd27947 Merge "Remove examples.doxy dep w/--disable-examples" 2011-02-28 07:18:41 -08:00
Johann
31dab574cc Merge "Remove a second check for invalid ptr in vp8_get_compressed_data" 2011-02-25 11:44:18 -08:00
Aaron Watry
da761c9a22 Fix crash on Sparc Solaris.
Sparc on Solaris requires memory copies in reconinter.c to be aligned.

Change-Id: I6c5b75fb80d6fd501ae4b41b533c3109c2f32be2
2011-02-25 10:14:10 -05:00
Johann
e4fa638653 Merge "Remove temporal alt ref from realtime only build" 2011-02-25 06:55:17 -08:00
Johann
1fae7018a8 Merge "Handle mem allocation failure in vp8e_init" 2011-02-25 06:55:10 -08:00
Attila Nagy
d8fc974ac0 Avoid double copying of key frames into alt and golden buffer
Change-Id: I726976a297a593a35ed6cba3c660e372562f7b27
2011-02-25 09:03:16 +02:00
Attila Nagy
6da2018789 Remove a second check for invalid ptr in vp8_get_compressed_data
Check is done first when function si entered.

Change-Id: Ief0d0cbd4860aaf492b78728f8d22f24029b1174
2011-02-25 08:41:13 +02:00
James Zern
1771722b2f Remove examples.doxy dep w/--disable-examples
This allows the base documentation to be built without the need for php
which is required to produce the example documentation

Change-Id: Id1861723c672fa8da132a074a4657e2cb94c1e79
2011-02-24 15:11:05 -08:00
James Zern
8e17e82d9e documentation: minor updates to vp8 (en|de)coder
Group algorithm interfaces to avoid undocumented warning from doxygen
and provide basic documentation for CQ level & cpuused.

Change-Id: I11095061be962cbc998741de9c8c3019d415e137
2011-02-24 14:12:57 -08:00
Scott LaVarnway
861175ef00 Removed vp8_block2type
and used defines instead.

Change-Id: Idb56e0295d004793f406dfd2d8d8c546aad62e03
2011-02-24 14:35:18 -05:00
Scott LaVarnway
d53492bba4 Merge "Revisited rd_pick_intra4x4block" 2011-02-24 11:25:21 -08:00
Scott LaVarnway
658454a04c Revisited rd_pick_intra4x4block
Removed unnecessary copies.  No noticeable speed gains.


Change-Id: I996c50c23fedd06d54ee7a3e762cbf559cc4a9d1
2011-02-24 13:31:47 -05:00
Paul Wilkins
b862c108dd Overflow of frame error accumulators.
This fixes an overflow problem in the frame error accumulators.

The overflow condition is extreme but did trigger when Frank B.
coded some high motion interlaced HD content.

The observed effect was a catastrophic  breakdown of the rate
control leading to massive undershoot and poor bit allocation.

All the error values should really be unsigned but I will look at this
separately.

Change-Id: I9745f5c5ca2783620426b66b568b2088b579151f
2011-02-24 15:49:41 +00:00
Johann
aee120afb9 Merge "documentation: minor cosmetics" 2011-02-24 07:01:25 -08:00
Tero Rintaluoma
8ae92aef66 ARMv6 optimized half pixel variance calculations
Adds following ARMv6 optimized functions to the encoder:
 - vp8_variance_halfpixvar16x16_h_armv6
 - vp8_variance_halfpixvar16x16_v_armv6
 - vp8_variance_halfpixvar16x16_hv_armv6

Change-Id: I1e9c2af7acd2a51b72b3845beecd990db4bebd29
2011-02-23 13:27:27 +02:00
Attila Nagy
e6db21ecc4 Handle mem allocation failure in vp8e_init
Change-Id: I0d0445c57eb0889082f83de1948852d57b38fefb
2011-02-23 12:36:03 +02:00
Johann
418f4219fa purge wince configuration
this has been broken since the initial release

Change-Id: If0d4deb2de9f7d0c4c05641e2bbf9cc1bf11e171
2011-02-22 14:42:00 -05:00
Attila Nagy
7af0d906e3 Remove temporal alt ref from realtime only build
It is not used in realtime mode. Reduces memory footprint.

Change-Id: I7f163225762368df5457cfd413050161d3704a3f
2011-02-22 12:53:32 +02:00
Johann
945dad277d Revert "use unaligned load"
This reverts commit f50f2fd2a7.

Change Ib7506e3e aligns the buffer

Change-Id: Ie0f8bd3e57cfdfef81d39638a1451458ebbae2e0
2011-02-18 10:23:02 -05:00
John Koleszar
c764c2a20f Merge "clean up unused files" 2011-02-18 06:33:05 -08:00
John Koleszar
3ed8fe8778 remove unused vp8_predict_dc function
Change-Id: I64fa47889c54cfed094a674c49ef0996d49bdd42
2011-02-18 09:12:20 -05:00
John Koleszar
cbf923b12c clean up unused files
Removed a number of files that were unused or little-used.

Change-Id: If9ae5e5b11390077581a9a879e8a0defe709f5da
2011-02-18 09:09:49 -05:00
John Koleszar
d371ca93e5 cosmetic: remove unnecessary scope
Clean up some unnecessary scoping around pick_filter_level.

Change-Id: Ic57fa33e3fcae37fe6beae977e5743783399d5af
2011-02-18 08:46:07 -05:00
John Koleszar
597d02b508 Merge "Dont pick encoder filter level when loopfilter is disabled." 2011-02-18 05:26:23 -08:00
Attila Nagy
fb5a692d27 Reinitialize quantizer only when any delta is changing
No need to reinitialize for base Q changes.

Change-Id: Ie76ec21dd3c5582d5183dbed75ed73a1eed3e291
2011-02-18 14:23:37 +02:00
Attila Nagy
c6ef75690f Dont pick encoder filter level when loopfilter is disabled.
Change-Id: I58154faf4f3ece24f9927a5c3ab7e830e0887fb6
2011-02-18 08:53:00 +02:00
John Koleszar
b2ae57f1b6 Merge "Use endian-neutral bitstream packing/unpacking" 2011-02-17 12:34:16 -08:00
John Koleszar
562f1470ce Use endian-neutral bitstream packing/unpacking
Eliminate unnecessary checks on target endianness and associated
macros.

Change-Id: I1d4e6a9dcee9bfc8940c8196838d31ed31b0e4aa
2011-02-17 15:20:53 -05:00
John Koleszar
ac10665ad8 Merge "Removed unused vp8_recon_intra4x4mb function" 2011-02-17 11:30:13 -08:00
Scott LaVarnway
07f7b66fae Removed unused vp8_recon_intra4x4mb function
Change-Id: I4a328ce152d9dbe6b0d1606d1b523e8e7bfb468e
2011-02-17 13:34:38 -05:00
John Koleszar
c351aa7f1b Merge "Fix relative include paths" 2011-02-17 04:13:44 -08:00
James Zern
f42d52e6bd documentation: minor cosmetics
- correct spelling
- remove explicit file name w/\file (unnecessary when contained in the
  same file and prone to desync)

Change-Id: I68a3960ac5ab84d0f2e5c9b2e29799f26dfccf23
2011-02-16 17:59:33 -08:00
Yunqing Wang
da9402fbf6 Merge "Allocate source buffers to be multiples of 16" 2011-02-16 11:35:06 -08:00
Yunqing Wang
da227b901d Allocate source buffers to be multiples of 16
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.

Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
2011-02-16 12:57:17 -05:00
Johann
0c2cfff9b0 Merge "ARMv6 optimized sad16x16" 2011-02-16 05:22:38 -08:00
James Zern
0030303b69 Remove redundant ptr checks in calls to vpx_free
vpx_free if used contains this check. If replaced, well behaved free
will behave similarly.

Change-Id: I25483aaa8b39255b9a8cf388d6e5eaa20a908ae1
2011-02-15 12:43:35 -08:00
Yunqing Wang
7725a7eb56 Merge "Improve vp8_sad16x16_sse3 function" 2011-02-14 14:09:25 -08:00
Yaowu Xu
27dad21548 Merge "Improved vp8_rd_pick_intra_mbuv_mode" 2011-02-14 13:58:12 -08:00
Scott LaVarnway
94d4fee08f Improved vp8_rd_pick_intra_mbuv_mode
Eliminated unnecessary calculations. Very small change
to performance.

Change-Id: Ib7213d43c64e36955177c4d47950ff472266f822
2011-02-14 16:34:33 -05:00
Yunqing Wang
2debd5b5f7 Improve vp8_sad16x16_sse3 function
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).

Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
2011-02-14 16:23:49 -05:00
Yaowu Xu
404e998eb7 Merge "mem leak fix for cpi->tplist" 2011-02-14 11:29:22 -08:00
James Berry
d3dfcde0f7 mem leak fix for cpi->tplist
checks added to make sure that cpi->tplist
is freed correctly in vp8_dealloc_compressor_data
and vp8_alloc_compressor_data.

Change-Id: I66149dbbd25c958800ad94f4379d723191d9680d
2011-02-14 14:02:52 -05:00
Scott LaVarnway
d419b93e3e Improved rd_pick_intra4x4block
Eliminated unnecessary calculations.  Improved performance
by 10% on keyframes and 1.6% overall for the test clip used.

Change-Id: I87671b26af5e2cc439e81d0fee3b15c7cd2a3309
2011-02-14 13:32:58 -05:00
Johann
0ff10bb1f7 Merge "remove assembly detokenizer" 2011-02-14 05:10:16 -08:00
Johann
bb6bcbccda remove assembly detokenizer
hasn't been kept up to date. remove it to avoid confusion.

Change-Id: I52ffde19b59fec5c7a381299ca2e85cb38330be7
2011-02-11 11:09:00 -05:00
Yunqing Wang
353246bd60 Merge "Add improved_mv_pred flag in real-time mode" 2011-02-11 07:20:17 -08:00
Yunqing Wang
9d0b2cbbce Add improved_mv_pred flag in real-time mode
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.

Change-Id: I9f4ae5a8bf449222d1784b54e7d315fc8347b2d1
2011-02-11 09:59:41 -05:00
Tero Rintaluoma
1ef86980b9 ARMv6 optimized sad16x16
Adds a new ARMv6 optimized function vp8_sad16x16_armv6 to encoder.

Change-Id: Ibbd7edb8b25cb7a5b522d391b1e9a690fe150e57
2011-02-11 11:14:07 +02:00
Yaowu Xu
4f8a166058 Merge "Redefining good quality speed settings" 2011-02-10 21:38:19 -08:00
Yunqing Wang
6f53e59641 Merge "Improve motion search in real-time mode" 2011-02-10 12:42:44 -08:00
John Koleszar
02321de0f2 Fix relative include paths
Allow compiling without adding vp8/{common,encoder,decoder} to the
include paths.

Change-Id: Ifeb5dac351cdfadcd659736f5158b315a0030b6c
2011-02-10 15:09:44 -05:00
Yunqing Wang
41e6eceb28 Improve motion search in real-time mode
Applied better MV prediction in real-time mode, which improves
the encoding quality.

Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.

Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.

Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
2011-02-10 13:40:24 -05:00
Johann
7d8199f0c3 Merge "Adds armv6 optimized variance calculation" 2011-02-10 06:06:46 -08:00
Scott LaVarnway
19054ab6da Redefining good quality speed settings
Created a new speed 1 which is in the middle of the old
speed 0 and speed 1. (for both quality and performance)

Change-Id: I4802133cdb43f359ca787646c090899679dd5d84
2011-02-09 17:18:28 -05:00
James Berry
fffa2a61d7 fixed stride in vp8_temporal_filter_predictors_mb_c
stride would not be calculated correctly for material
with odd sized frame widths.

Change-Id: I1710f6aef9ebb93d36249c9239c68c5baa9791f8
2011-02-09 16:55:39 -05:00
John Koleszar
c2b43164bd Merge "correct cost for implicit bit in mvs" 2011-02-09 11:20:12 -08:00
John Koleszar
9954d05ca6 correct cost for implicit bit in mvs
Use 0xFFF0 vice 240 (0xF0) for determining whether the sometimes
implicit bit 3 will be transmitted. This is consistent with the decoder
and encode_mvcomponent().

Change-Id: Ic1304d0ab56844bed8236edd1c5243a6767fc6b1
2011-02-09 12:50:17 -05:00
John Koleszar
a39b5af10b Merge "Put more code under #if CONFIG_MULTITHREAD." 2011-02-09 08:31:36 -08:00
Gaute Strokkenes
315e3c2518 Put more code under #if CONFIG_MULTITHREAD.
Change-Id: Icf4b692099d7d249fe3553852b1022b027b28e4b
2011-02-09 11:21:18 -05:00
Scott LaVarnway
85e79ce288 Merge "Added early breakout for vp8_rd_pick_intra4x4mby_modes" 2011-02-09 07:55:04 -08:00
John Koleszar
c96031da69 Merge "vp8e_get_preview fixed for resized frames" 2011-02-09 07:41:40 -08:00
Tero Rintaluoma
cb14764fab Adds armv6 optimized variance calculation
Adds vp8_sub_pixel_variance16x16_armv6 function to encoder. Integrates
ARMv6 optimized bilinear interpolations from vp8/common/arm/armv6
and adds new assembly file for variance16x16 calculation.
 - vp8_filter_block2d_bil_first_pass_armv6   (integrated)
 - vp8_filter_block2d_bil_second_pass_armv6  (integrated)
 - vp8_variance16x16_armv6 (new)
 - bilinearfilter_arm.h (new)
Change-Id: I18a8331ce7d031ceedd6cd415ecacb0c8f3392db
2011-02-09 10:23:43 -05:00
Johann
e5aaac24bb clean up bilinear filter
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.

recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.

change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).

recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C

Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
2011-02-08 17:42:54 -05:00
Fritz Koenig
cc17629f30 Merge "build: Change to iOS SDK 4.2" 2011-02-08 13:59:12 -08:00
Scott LaVarnway
13db80c282 Added early breakout for vp8_rd_pick_intra4x4mby_modes
Improved performance of good quality, speed 0 (3% average)
with no average quality loss.

Change-Id: Ica34473f99bd74260eaebde6b132185e09e3c09d
2011-02-08 16:50:43 -05:00
Johann
40dcae9c2e clarify *_offsets.asm differences
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.

Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
2011-02-08 16:35:43 -05:00
Fritz Koenig
615c90c948 build: Change to iOS SDK 4.2
Brings configure/build system inline with current iOS SDK.

Change-Id: If391693a80cab371f75708214f3882424ead9e96
2011-02-08 14:50:33 -05:00
James Berry
ddacf1cf69 vp8e_get_preview fixed for resized frames
preview_img d_w and d_h along with w and h
would not be updated for resized frames.

now uses sd.y_width and sd.y_height

Change-Id: I52241de4cc1de5e73f865e668bd70a7cbd954390
2011-02-08 14:27:00 -05:00
Andoni Morales Alastruey
48140167cd Fix counter of fixed keyframe distance
When the keyframe distance is fixed the first interval has the right
distance but, the next ones have kf_distance + 1.

Change-Id: I44f1190fe7146124bd07660a5e0ef08829e3ae07
2011-02-07 18:30:04 +01:00
Johann
3273c7b679 move one of the offset files
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets

Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
2011-02-07 11:35:30 -05:00
John Koleszar
eaadfb5869 Merge "Translates -g from LDFLAGS as --debug in armlink_adapter.sh" 2011-02-07 06:21:56 -08:00
John Koleszar
adaf2b697c Merge "remove unused dboolhuff code" 2011-02-07 05:36:26 -08:00
Yunqing Wang
58d2e70fc5 Fix link error in real-time mode
make vp8_mv_pred() and vp8_cal_sad() available in real-time mode.

Change-Id: I71dbae241b486ba943458dcbae552ec4a51689d3
2011-02-07 08:21:14 -05:00
Attila Nagy
0905af38fc Translates -g from LDFLAGS as --debug in armlink_adapter.sh
Change-Id: I23ad88db2149ab788ff39aed8624a7ef0e97da2e
2011-02-07 08:58:19 +02:00
Johann
bb9c95ea53 remove unused dboolhuff code
we were holding on to this "just in case." purge it instead

Change-Id: I77a367b36d0821d731019f2566ecfffdae1d4b8a
2011-02-04 16:00:00 -05:00
Yunqing Wang
350ffe8dae Merge "Improve MV prediction in vp8_pick_inter_mode() for speed>3" 2011-02-04 10:10:15 -08:00
John Koleszar
b601eb8cda configure: enable unused variable warnings
Only suppress unused function warnings, rather than supprressing all
unused-* warnings. Unused functions can still be seen with
--enable-extra-warnings.

Change-Id: Ibca20d859dbffedd76bd082ffe0fa685c3ac198e
2011-02-04 11:53:11 -05:00
John Koleszar
63fc44dfa5 correct quantizer initialization
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.

This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.

Change-Id: Ia6733464a55ee4ff2edbb82c0873980d345446f5
2011-02-04 11:37:47 -05:00
John Koleszar
6bf7e2cc37 Merge "Remove duplicate loopfilter parameters." 2011-02-04 07:07:45 -08:00
Gaute Strokkenes
ffc6aeef14 Remove duplicate loopfilter parameters.
Change-Id: I0d41415e3961c2c9492d342290c1999f9d02e6d8
2011-02-04 14:55:02 +00:00
John Koleszar
c0a9cbebe1 Merge "Delay auto key frame insertion in realtime configuration" 2011-02-04 05:16:15 -08:00
Gaute Strokkenes
bf5f585b0d Make vp8_adjust_mb_lf_value return the updated value rather than
manipulating it in situ via a pointer.

Change-Id: If4a87a4eccd84f39577c0e91e171245f4954c5cf
2011-02-03 19:24:16 +00:00
John Koleszar
209def2d72 Merge "Avoid using an anonymous union." 2011-02-03 09:08:50 -08:00
Scott LaVarnway
4aa12b6c5f Merge "Zero out block mv when an intra mode is selected" 2011-02-03 07:16:52 -08:00
Yunqing Wang
a870315629 Merge "Improved encoder threading" 2011-02-03 05:44:57 -08:00
Gaute Strokkenes
72ebafff51 Avoid using an anonymous union.
Change-Id: I5744269a35e2d696ecf40c1665efd572bfc9b6cb
2011-02-02 15:22:51 +00:00
Attila Nagy
e5904f2d5e Delay auto key frame insertion in realtime configuration
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.

Change-Id: I15c175fa845ac4c1a1f18bea3676e154669522a7
2011-02-02 13:54:40 +02:00
Scott LaVarnway
07a7c08aef Zero out block mv when an intra mode is selected
instead of each time mode is tested.

Change-Id: Ief0f5586dafde54cc14d348dcecdacb182e7c1d5
2011-02-01 12:55:51 -05:00
Scott LaVarnway
a5ecaca6a7 Removed unnecessary B_MODE_INFO memset.
Change-Id: I2bcef6a8e47f88542861fd1356631ca934e2a0e7
2011-02-01 11:35:08 -05:00
Scott LaVarnway
b18df82e1d Moved rd calculation into vp8_pick_intra4x4mby_modes
Then removed unnecessary code.

Change-Id: I142658815d843c9396b07881dbdd8d387c43c90e
2011-02-01 11:26:04 -05:00
Scott LaVarnway
4e7e79f770 Removed intra_modes from vp8cx_encode_intra_macro_block
Restructured function in order to eliminate the prediction
modes save/restore.  Code cleanup also.

Change-Id: I816e3b910de64d0f0f0ddc2398805c63263191e8
2011-02-01 10:05:35 -05:00
Attila Nagy
385c2a76d1 Improved encoder threading
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.

Change-Id: Ic97e4d1c4886a842c85dd3539a93cb217188ed1b
2011-02-01 12:17:58 +02:00
Scott LaVarnway
9e7fec216e Removed prediction_error accumulation
from vp8cx_encode_intra_macro_block.  prediction_error is used when
deciding if a frame should be a keyframe.  After reviewing this with
Yaowu, it was pointed out that vp8cx_encode_intra_macro_block
is only called for keyframes, so the accumulation is unnecessary.

Change-Id: Id79dc81b80d4f5d124f3a0dba1b923887e2e1ec8
2011-01-31 19:53:02 -05:00
Scott LaVarnway
317f0da91e Removed last_auto_filter_prediction_error
last_auto_filter_prediction_error is not really used.

Change-Id: Ic6e56c4076bbd250ef783ee1be46964c85f62864
2011-01-31 19:41:09 -05:00
Scott LaVarnway
4a15e55793 Possible bug in vp8cx_encode_intra_macro_block
vp8_pick_intra4x4mby_modes uses the passed in distortion
for an early breakout.  The best distortion was never saved
and the distortion for TM_PRED was always used.

Change-Id: Idbaf73027408a4bba26601713725191a5d7b325e
2011-01-31 17:43:18 -05:00
Scott LaVarnway
60fde4d342 Merge "Performance improvement of first pass" 2011-01-31 13:02:23 -08:00
Yaowu Xu
6d19d40718 Merge "change the threshold of DC check for encode breakout" 2011-01-31 11:00:46 -08:00
John Koleszar
f6214d1db8 Merge "validate min_q against max_q" 2011-01-31 07:33:55 -08:00
John Koleszar
2d03f073a7 validate min_q against max_q
min_q is required to be <= max_q.

Change-Id: I28eccf96df3b52a94913762b54c4fbe0d021ce5e
2011-01-31 10:33:00 -05:00
Adrian Grange
408a8adc15 Merge "Changed condition for using RD in Intra Mode" 2011-01-31 02:18:40 -08:00
Yaowu Xu
8f279596cb change the threshold of DC check for encode breakout
Previously, the DC check is to make sure there is no code-able
DC shift for quantizer Q0, which has been verified rather
conservative. This commit changes the criteria to have two
components, DC and AC, to address the conservativeness. First,
it checks if all AC energy is enough to contribute a single
non-zero quantized AC coefficient. Second, for DC, the decision
to skip further considers two possible scenarios: 1. There is
no code-able 2nd order DC coefficient at all; 2 The residue is
relatively flat, but the uniform DC change is very small, i.e.
less than 1/2 gray level per pixel.

Comparing to previous criteria, the new criteria is about 10%
to 15% faster in encoding time with a very small quality loss.
(threshold ~1000 and quality range 33db-45db)

It should be noted that this commit enables "automatic" static
threshold for encodebreakout if a non-zero small value is passed
in to encoder.

Change-Id: I0f77719a1ac2c2dfddbd950d84920df374515ce3
2011-01-28 09:43:23 -08:00
Johann
f3cb9ae459 Merge "Adds "armvX-none-rvct" targets" 2011-01-28 09:03:58 -08:00
Yunqing Wang
7cbe684ef5 Improve MV prediction in vp8_pick_inter_mode() for speed>3
Applied same method used in vp8_rd_pick_inter_mode() to improve
the accuracy of MV prediction.

Change-Id: Ia50ae26208b18482695601f32febd99fe89fbc17
2011-01-28 10:00:20 -05:00
Adrian Grange
e9f513d74a Changed condition for using RD in Intra Mode
The condition for using RD when selecting the intra coding mode
for a MB is that the RD flag is set AND we're not in real-time
mode.

Previously the code used RD if either the RD flag was set OR
we were not using real-time mode.

Change-Id: Ic711151298468a3f99babad39ba8375f66d55a08
2011-01-28 14:47:36 +00:00
Paul Wilkins
dcb23e2aaa Inconsistent distortion metric in vp8_rd_pick_intra_mbuv_mode
This function was using a variance metric compared to and SSE metric in
other places (eg. vp8_rd_inter_uv)

Change-Id: I9109fcc5a13bca9db1d7ead500fe14999ab233eb
2011-01-28 13:13:30 +00:00
Tero Rintaluoma
11a222f5d9 Adds "armvX-none-rvct" targets
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
 - armv5te-none-rvct
 - armv6-none-rvct
 - armv7-none-rvct

To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.

Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.

Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
2011-01-28 12:47:39 +02:00
Johann
73207a1d8b warning: pointer targets differ in signedness
vp8/encoder/rdopt.c:728: warning: pointer targets in passing argument 3
of 'macro_block_yrd' differ in signedness
vp8/encoder/rdopt.c:541: note: expected 'int *' but argument is of type
'unsigned int *'

distortion is signed when calling macro_block_yrd is both other cases,
as well as for RDCOST

Change-Id: I5e22358b7da76a116f498793253aac8099cb3461
2011-01-27 11:53:26 -05:00
Johann
27000ed6d9 clean up implicit declaration warnings for neon
Change-Id: I6ca2d89f355839c4c770773c09fc69dcea7c1406
warning: implicit declaration of function
  'vp8_variance_halfpixvar16x16_[h|v|hv]_neon'
  'vp8_sub_pixel_variance16x16_neon_func'
2011-01-27 11:31:59 -05:00
Scott LaVarnway
8a5c255b3d Merge "Removed unused members from VP8_COMP" 2011-01-27 08:12:22 -08:00
Yunqing Wang
bb30ffc4dc Merge "Remove copies of same functions" 2011-01-27 08:11:26 -08:00
Yunqing Wang
3ee4e1e79f Merge "Refine motion vector prediction for NEWMV mode" 2011-01-27 08:10:53 -08:00
Scott LaVarnway
3c18a2bb2e Performance improvement of first pass
Improved the performance of the first pass only
(~6% on 720p test clip) by making use of LUT instead of the
float calculations.  Might try a SIMD version later.
Also started to make use of int_mv instead of
MV.

Change-Id: If2a217c7d6b59cd2c25c5553e0ca7e0502403af8
2011-01-26 16:42:56 -05:00
Yunqing Wang
cac54404b9 Remove copies of same functions
Reduce the code size.

Change-Id: I2e1998557a3c8776e262c442fd758c25e17aff7a
2011-01-26 15:37:00 -05:00
Scott LaVarnway
c4887da39c Removed unused members from VP8_COMP
Change-Id: I8f3f2642b02975fbdb14982984a29821f80d30d3
2011-01-26 15:07:17 -05:00
Paul Wilkins
35bb74a6bd Rationalize vp8_rd_pick_intra16x16mby_mode()
Use the function macro_block_yrd() to calculate error and distortion
in keeping with what is done for inter frames.

The old code was using a variance metric for once case and an
SSE function for measuring distortion in the other case.

The function vp8_encode_intra16x16mbyrd() is no longer used.

Change-Id: Ic228cb00a78ff637f4365b43f58fbe5a9273d36f
2011-01-26 18:46:34 +00:00
Paul Wilkins
e8e09d33df Merge "Correction to buffer update for non-viewable frames." 2011-01-26 09:33:48 -08:00
Yaowu Xu
82266a1ac9 Merge "cap the best quantizer for 2nd order DC" 2011-01-26 09:27:11 -08:00
John Koleszar
be3e0ff7c3 Merge "Adds vpx_vp8_enc_asm_offsets.c.o to OBJS-yes list" 2011-01-26 07:29:19 -08:00
Attila Nagy
0def48b60f Adds vpx_vp8_enc_asm_offsets.c.o to OBJS-yes list
Change-Id: Ibd6e3bc82471839904b1086b499efc55f7c5cbaf
2011-01-26 17:06:09 +02:00
Paul Wilkins
a3f71ccff6 Correction to buffer update for non-viewable frames.
The code previously tested cpi->common.refresh_alt_ref_frame
but there are situations where this flag may be set for viewable frames.

The correct test should be !cm->show_frame.

Change-Id: Ia1a600622992a4a68fe1d38ac23bf6b34b133688
2011-01-26 12:52:31 +00:00
Paul Wilkins
2caa36aa4f Merge "Fix for incorrect variable declaration." 2011-01-26 01:53:53 -08:00
Yaowu Xu
999e155f55 cap the best quantizer for 2nd order DC
This commit also removes artificial RDMULT cap for low quantizers.
The intention is to address some abnormal behavior of mode selections
at the low quantizer end, where many macroblocks were coded with
SPLITMV with all partitions using same motion vector including (0,0).
This change improves the compression quality substantially for high
quality encodings in both PSNR and SSIM terms. Overall effect on
mid/low rate range is also positive for all metrics, but smaller
in magnitude.

Change-Id: I864b29c4bd9ff610d2545fa94a19cc7e80c02667
2011-01-25 22:26:18 -08:00
Fritz Koenig
53d8e9dc97 Fix for incorrect variable declaration.
Commit 336aa0b7da incorrectly
declared current_pos as and int, when it should have been
a FIRSTPASS_STATS pointer.

Change-Id: I0a51c7a86ebba8546c95dd5d9d1c1143d4613e40
2011-01-25 15:41:41 -08:00
Johann
907e98fbb5 Merge "update sse2 regular quantizer" 2011-01-25 13:40:28 -08:00
Johann
58f19cc697 Merge "move new neon subpixel function" 2011-01-25 13:09:05 -08:00
Yunqing Wang
dcaaadd8ed Refine motion vector prediction for NEWMV mode
Adjust checking points in motion vector prediction to better cover
possible movements, and get a better prediction. Tests on test
clips showed a 0.1% improvement in SSIM, and no change in PSNR
and performance.

Change-Id: Ifdab05d35e10faea1445c61bb73debf888c9d2f8
2011-01-25 15:54:34 -05:00
Johann
af7d23c9b4 Merge "Fix issue 262, vp8cx_pack_tokens_into_partitions_armv5" 2011-01-25 12:49:52 -08:00
Johann
2168a94495 move new neon subpixel function
previously wasn't guarded with ifdef ARMV7, causing a link error with
ARMV6

Change-Id: I0526858be0b5f49b2bf11e9090180b2a6c48926d
2011-01-25 15:48:37 -05:00
Yunqing Wang
4e149bb447 Merge "Modify calling of NEON code in sub-pixel search" 2011-01-25 09:54:23 -08:00
Attila Nagy
3bf235a4c9 Fix issue 262, vp8cx_pack_tokens_into_partitions_armv5
http://code.google.com/p/webm/issues/detail?id=262
Function was asuming that partitions have equal amount of mb_rows,
which is not always true.

Change-Id: I59ed40117fd408392a85c633beeb5340ed2f4b25
2011-01-25 15:55:02 +02:00
Paul Wilkins
a69c18980f Merge "Incorrect bit allocation in forced KF groups." 2011-01-25 05:32:26 -08:00
Paul Wilkins
336aa0b7da Incorrect bit allocation in forced KF groups.
The old 2 pass code estimated error distribution when coding a
forced (by interval) key frame. The result of this was that in some
cases, when allocating bits at the GF group level within a KF
group there was either a glut of bits or starvation of bits at the end
of the KF group.

Added code to rescan and get the correct data once the position of
a forced key frame has been determined.

Change-Id: I0c811675ef3f9e4109d14bd049d7641682ffcf11
2011-01-25 12:29:06 +00:00
James Berry
eb8b4d9a99 configure.sh fix for visual studio
-For targets with external build systems like visual
studio CC is not set so check_add_cflags will fail.
Only call this function if extra_cflags is set.

Change-Id: I3531bad69e9b6a59c5be1b0e8b6053ccccbc332c
2011-01-24 16:57:20 -05:00
Scott LaVarnway
0ee525d6de Added vp8_update_zbin_extra
vp8cx_mb_init_quantizer was being called for every mode checked
in vp8_rd_pick_inter_mode.  zbin_extra is the only value that
really needs to be recalculated.  This calculation is disabled
when using the fast quantizer for mode selection.
This gave a small performance boost (~.5% to 1%).
Note: This needs to be verified with segmentation_enabled.

Change-Id: I62716a870b3c82b4a998bdf95130ff0b02106f1e
2011-01-24 11:00:56 -05:00
Yunqing Wang
d3e9409bb0 Merge "Modify sub-pixel filters to eliminate unnecessary calculations" 2011-01-21 11:07:17 -08:00
Yunqing Wang
0822a62f40 Modify sub-pixel filters to eliminate unnecessary calculations
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.

Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
2011-01-21 13:59:27 -05:00
Paul Wilkins
0cdfef1e22 Modified static scene check.
Added code to scan ahead a few frames when we see what
we think is a static scene in the two pass GF loop to see if the
conditions persist.

Moved calculation of decay rate out into a fuunction.

Change-Id: I6e9c67e01ec9f555144deafc8ae67ef25bffb449
2011-01-21 17:52:00 +00:00
Paul Wilkins
8064583d26 Further work to reduce pulsing.
These changes are specifically targeted at fade transitions to
static scenes. Here we want to place a GF/ARF immediately
after the fade and prevent an ARF just  before the fade.

Also some code lines and comment lines shortened to 80 chars
while I was there.

Change-Id: Iefdc09a4fa7b265048fc017246b73e138693950f
2011-01-20 18:01:20 +00:00
Attila Nagy
419553258d Update configure scripts
Add --extra-cflags as config parameter for user defined extra CFLAGS.
Add -g to asflags when debug enabled for arm targets.

Change-Id: Ibdde7cfdda6736c1c1db45e6466bd08504a51f15
2011-01-20 17:59:27 +02:00
Adrian Grange
815e1e9fe4 Fixed use of motion percentage in KF/GF group calc
In both vp8_find_next_key_frame and define_gf_group,
motion_pct was initialised at the top of the loop before
next_frame stats had been read in.

This fix sets motion_pct after next_frame stats have
been read.

Change-Id: I8c0bebf372ef8aa97b97fd35b42973d1d831ee73
2011-01-20 13:13:33 +00:00
Paul Wilkins
06e7320c3e Merge "First pass loop bug." 2011-01-19 08:33:34 -08:00
Paul Wilkins
e867516843 First pass loop bug.
Incorrect value loop_decay_rate used in GF loop.

The intent was to test the  cumulative value decay_accumulator.

Change-Id: I62928c63eb09f4f6936a45ebd1c23784d1c9681b
2011-01-19 15:50:22 +00:00
John Koleszar
2f0331c90c Merge "Implement error tracking in the decoder" 2011-01-19 05:51:00 -08:00
Henrik Lundin
67fb3a5155 Implement error tracking in the decoder
A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output
from the function is non-zero if the last decoded frame contains
corruption due to packet losses.

The decoder is also modified to accept encoded frames of zero length.
A zero length frame indicates to the decoder that one or more frames
have been completely lost. This will mark the last decoded reference
buffer as corrupted. The data pointer can be NULL if the length is
zero.

Change-Id: Ic5902c785a281c6e05329deea958554b7a6c75ce
2011-01-19 09:53:21 +01:00
John Koleszar
f97f2b1bb6 Merge "fix last frame buffer copy logic regression" 2011-01-18 12:54:57 -08:00
Yunqing Wang
ce6c954d2e Modify calling of NEON code in sub-pixel search
In vp8_find_best_sub_pixel_step_iteratively(), many times xoffset
and yoffset are specific values - (4,0) (0,4) and (4,4). Modified
code to call simplified NEON version at these specific offsets to
help with the performance.

Change-Id: Iaf896a0f7aae4697bd36a49e182525dd1ef1ab4d
2011-01-18 14:19:52 -05:00
Jim Bankoski
edcf74c6ad vp8e -removed undefined max call
Change-Id: I42a86b0488f44115f09551fc5ad6d711fd470f0d
2011-01-18 11:21:32 -05:00
Paul Wilkins
d6d5d43708 Merge "Further CQ, Key frame and ARF changes" 2011-01-18 08:04:46 -08:00
Paul Wilkins
57136a268a Further CQ, Key frame and ARF changes
This code fixes a bug in the calculation of
the minimum Q for alt ref frames.

It also allows an extended gf/arf interval for sections
of clips that completely static (or nearly so).

Change-Id: I1a21aaa16d4f0578e5f99b13bebd78d59403c73b
2011-01-18 15:19:05 +00:00
Attila Nagy
cb791aaa2f Fix encoder real-time only configuration.
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in  validate_config.

Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
2011-01-18 08:19:21 -05:00
Paul Wilkins
339c512762 Fix CQ range and experimental KF sizing changes.
The CQ level was not using the q_trans[] array to convert
to a 0-127 range as per min and maxq

Experimental change to try and match the reconstruction
error for forced key frames approximately to that of the
previous frame by means of the recode loop. Though this
may cause extra recodes and the recode behavior has not
been optimized, it can only happen on forced key frames.

Change-Id: I1f7e42d526f1b1cb556dd461eff1a692bd1b5b2f
2011-01-17 17:24:45 +00:00
Johann
15f9bea73b update sse2 regular quantizer
about ~5% gain on 32bit. disabled for 64bit

unset executable bit on ssse3 version (cosmetic)

Change-Id: I1a5860839eb294ce4261f819caea2dcfa78e57ca
2011-01-14 14:26:10 -05:00
Paul Wilkins
a1a4d23797 Merge "KF/GF Pulsing" 2011-01-14 09:20:37 -08:00
Paul Wilkins
3aafb47729 Merge "Testing of modes with Alt Ref frame" 2011-01-14 07:26:37 -08:00
Paul Wilkins
8f711db4e8 Merge "Experimental change to help with ARNR problem." 2011-01-14 07:26:01 -08:00
Paul Wilkins
415371c9d9 Testing of modes with Alt Ref frame
Previously when a frame was being overlaid on a previously coded
alt ref frame we only checked the alt ref 0,0 mode. Where there is
a possibility that the alt ref buffer is a filtered frame we should allow
the other prediction modes as normal or at the least allow use of
the last frame buffer.

Change-Id: I4d6227223d125c96b4f3066ec6ec9484fee7768c
2011-01-14 15:20:45 +00:00
Adrian Grange
2c1b06e672 ARNR filter pointer update bug fix
In cases where the frame width is not a multiple of 16 the
ARNR filter would go wrong.

In vp8_temporal_filter_iterate_c when updating pointers
at the end of a row of MBs,  the image size was
incorrectly used rather than using Num_MBs_In_Row
times 16 (Y) or 8 (U,V).

This worked when width is multiple of 16 but failed
otherwise.

Change-Id: I008919062715bd3d17c7aa2562ab58d1cb37053a
2011-01-14 15:04:39 +00:00
Paul Wilkins
72e22b0bb8 Experimental change to help with ARNR problem.
Allow use of other reference frames for the ARF overlay frame
when ARNR filtering is enabled

Change-Id: Icd6a9fb38977a88fbe7cc9b9c18198eb454c0273
2011-01-14 12:07:12 +00:00
Paul Wilkins
c8338ebf7a KF/GF Pulsing
This change is designed to try and reduce pulsing effects when moving
with a complex transition like a fade, into an easy or static section in
an otherwise difficult clip in CQ mode.

The active CQ level is relaxed down to the user entered level for frames that
are generating less than the passed in minimum bandwidth.

Change-Id: Id6d8b551daad4f489c087bd742bc95418a95f3f0
2011-01-14 11:37:26 +00:00
Scott LaVarnway
b082790c7d Merge "Moved ref frame calculations" 2011-01-13 06:59:28 -08:00
Paul Wilkins
eda7d538bf One pass rate control correction.
Fixed discrepancy cpi->ni_frames vs cm->current_video_frame > 150.

Make one pass path explicit.

There is still scope for some odd behaviour around the transition
point at cpi->ni_frames > 150.

Change-Id: Icdee130fe6e2a832206d30e45bf65963edd7a74d
2011-01-13 12:51:41 +00:00
Paul Wilkins
55acda98f7 Limit key frame quantizer for forced key frames.
Where a key frame occurs because of a minimum interval
selected by the user, then these forced key frames ideally need
to be more closely matched in quality to the surrounding frame.

Change-Id: Ia55b1f047e77dc7fbd78379c45869554f25b3df7
2011-01-12 17:43:59 +00:00
Scott LaVarnway
96fd758ea9 Moved ref frame calculations
Moved ref frame calculations to outside of the
mode_index loop.

Change-Id: I06103fc7e8af88b54b84443acf6691d29b1272ac
2011-01-11 15:00:00 -05:00
Yunqing Wang
6ff2b0883a Merge "Add no_skip_block4x4_search flag in SPLITMV mode" 2011-01-11 08:34:24 -08:00
Johann
e88d7ab245 Merge "use unaligned load" 2011-01-11 08:25:22 -08:00
Johann
f50f2fd2a7 use unaligned load
source buffer is not guaranteed to be aligned for odd size buffers

Change-Id: Id0b1fd40ba3bd6c994bcfada788feccd2b53c5a9
2011-01-11 11:22:29 -05:00
Yunqing Wang
1546e6a8c9 Add no_skip_block4x4_search flag in SPLITMV mode
Add a flag to always enable block4x4 search for speed=0 (good
quality) to guarantee no quality loss for speed0.

Change-Id: Ie04bbc25f7e6a33a7bfa30e05775d33148731c81
2011-01-11 09:50:13 -05:00
Henrik Lundin
48c28fc42c Remove unused local variables
Removing unused local variables causing compiler warnings in
Visual Studio.

Change-Id: I0e2096303be1fdbc01428a6e57cca9796bb32c8a
2011-01-11 15:22:19 +01:00
Yunqing Wang
3675b2291c Fix bug in motion search
The maximum possible MV in 1/8 pel units is (1<<11), which could
cause mvcost out of its range that is 1023. Change maximum
possible MV in 1/8 pel units to (1<<11)-8 will fix this problem.

Change-Id: I5788ed1de773f66658c14f225fb4ab5b1679b74b
2011-01-10 16:16:59 -05:00
Paul Wilkins
cf7c4732e5 Two Pass VBR change
Further experiment with restriction of the Q range.

This uses the average non KF/GF/ARF quantizer,  instead
of just relying on the initial value. It is not such a strong constraint
but there may be a reduced risk of rate misses.

Change-Id: I424fe782a37a2f4e18c70805e240db55bfaa25ec
2011-01-10 16:41:53 +00:00
Paul Wilkins
405499d835 Revert BASE_ERRPERMB
Constant value reverted pending more tests
on different video formats.

Change-Id: I07d11a0e0185e60724698c835416caf2e0774e61
2011-01-10 16:02:51 +00:00
Paul Wilkins
c28b10adeb Merge "CQ Mode" 2011-01-07 11:05:56 -08:00
Paul Wilkins
e0846c9c8c CQ Mode
The merge includes hooks to for CQ mode and other code
changes merged from the test branch.

CQ mode attempts to maintain a more stable quantizer within a clip
whilst also trying to adhere to a guidline maximum bitrate.

The existing target data rate parameter is used to specify the
guideline maximum bitrate.

A new parameter allows the user to specify a target CQ level.

For normal (non kf/gf/arf) frames, the quantizer will not drop BELOW the
user specified value (0-63). However, in some cases the encoder may
choose to impose a target CQ that is above that specified by the user,
if it estimates that consistent use of the target value is not compatible
with guideline maximum bitrate.

Change-Id: I2221f9eecae8cc3c431d36caf83503941b25e4c1
2011-01-07 18:46:29 +00:00
Paul Wilkins
ba976eaa9b Merge "Limit Q variability in two pass." 2011-01-07 09:32:29 -08:00
Paul Wilkins
3af3593c8e Limit Q variability in two pass.
In two pass encoding each frame is given an active
Q range to work with. This change limits how much this
Q range can be altered over time from the initial estimate
made for the clip as a whole.

There is some danger this could lead to overshoot or undershoot
in some corner cases but it helps considerably in regard to
clips where either there is a glut or famine of bits in some sections,
particularly near the end of a clip.

Change-Id: I34fcd1af31d2ee3d5444f93e334645254043026e
2011-01-07 17:23:50 +00:00
Paul Wilkins
f7e2f1fedf Merge "Disable some features for first pass." 2011-01-07 08:34:27 -08:00
Scott LaVarnway
dd314351e6 Merge "Removed cpi->target_bits_per_mb" 2011-01-07 06:46:45 -08:00
Scott LaVarnway
6dbdfe3422 Removed cpi->target_bits_per_mb
cpi->target_bits_per_mb is currently not being used,
so delete it.  Also removed other unused code in rdopt.c.

Change-Id: I98449f9030bcd2f15451d9b7a3b9b93dd1409923
2011-01-07 09:41:13 -05:00
Johann
8b0cf5f79d x86 sse2 temporal_filter_apply
count can be reduced to short because the max number of filtered frames
is set to 15. the max value for any frame is 32 (modifier = 16,
filter_weight = 2). 15*32 = 480 which requires 9 bits

this function goes from about 7000 us / 1000 iterations for the C code
to < 275 us / 1000 iterations for sse2 for block_size = 16 and from
about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8

Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
2011-01-06 14:00:30 -05:00
John Koleszar
1942eeb886 fix last frame buffer copy logic regression
Commit 0ce3901 introduced a change in the frame buffer copy logic where
the NEW frame could be copied to the ARF or GF buffer through the
copy_buffer_to_{arf,gf}==1 flags, if the LAST frame was not being
refreshed. This is not correct. The intent of the
copy_buffer_to_{arf,gf}==1 flag is to copy the LAST buffer. To copy the
NEW buffer, the refresh_{alt_ref,golden}_frame flag should be used.

The original buffer copy logic is fairly convoluted. For example:

    if (cm->refresh_last_frame)
    {
        vp8_swap_yv12_buffer(&cm->last_frame, &cm->new_frame);

        cm->frame_to_show = &cm->last_frame;
    }
    else
    {
        cm->frame_to_show = &cm->new_frame;
    }
    ...
    if (cm->copy_buffer_to_arf)
    {
        if (cm->copy_buffer_to_arf == 1)
        {
            if (cm->refresh_last_frame)
                vp8_yv12_copy_frame_ptr(&cm->new_frame, &cm->alt_ref_frame);
            else
                vp8_yv12_copy_frame_ptr(&cm->last_frame, &cm->alt_ref_frame);
        }
        else if (cm->copy_buffer_to_arf == 2)
            vp8_yv12_copy_frame_ptr(&cm->golden_frame, &cm->alt_ref_frame);
    }

Effectively, if refresh_last_frame, then new and last are swapped, so
when "new" is copied to ARF, it's equivalent to copying LAST to ARF. If
not refresh_last_frame, then LAST is copied to ARF. So LAST is copied to
ARF in both cases.

Commit 0ce3901 removed the first buffer swap but kept the
refresh_last_frame?new:last behavior, changing the sense since the first
swap wasn't done to the more readable refresh_last_frame?last:new, but
this logic is not correct when !refresh_last_frame.

This commit restores the correct behavior from v0.9.1 and prior. This
case is missing from the test vector set.

Change-Id: I8369fc13a37ae882e31a8a104da808a08bc8428f
2011-01-06 13:07:42 -05:00
Paul Wilkins
431dac08d1 Disable some features for first pass.
The following features don't make sense for the first
pass in its current form and have a significant impact on its
speed (up to 50%).

Slow quantizer, slow dct and trellis optimization.

Change-Id: Id9943f6765ffbd71fc0084ec7dfbc9d376fd6fcd
2011-01-06 17:10:07 +00:00
Paul Wilkins
b095d9df3c Adjustment to boost calculation in two pass.
Calculate a minimum intra value to be used in determining the
IIratio scores used in two pass, second pass.

This is to make sure sections that are low complexity" in the
intra domain are still boosted appropriately for KF/GF/ARF.

For now I have commented out the Q based adjustment of
KF boost.

Change-Id: I15deb09c5bd9b53180a2ddd3e5f575b2aba244b3
2011-01-04 18:11:28 +00:00
Scott LaVarnway
de4e8185e9 Fixed encoder crash when mult-threading is enabled.
Happens in real-time mode.  Will happen in good quality, speed 1.

Change-Id: I3e5b68827b1a5798d0431b088a709256d1ce2c95
2010-12-29 16:41:22 -05:00
Yunqing Wang
a864678cdb Always update last_frame_type
Scott pointed out that last_frame_type only gets updated while
loopfilter exists. Since last_frame_type is also needed in
motion search now, it needs to be updated every frame.

Change-Id: I9203532fd67361588d4024628d9ddb8e391ad912
2010-12-29 10:28:35 -05:00
Scott LaVarnway
3fb4abf3d1 Merge "Use the fast quantizer for inter mode selection" 2010-12-28 11:56:11 -08:00
Scott LaVarnway
516ea8460b Use the fast quantizer for inter mode selection
Use the fast quantizer for inter mode selection and the
regular quantizer for the rest of the encode for good quality,
speed 1.  Both performance and quality were improved.  The
quality gains will make up for the quality loss mentioned in
I9dc089007ca08129fb6c11fe7692777ebb8647b0.

Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
2010-12-28 14:51:46 -05:00
Yunqing Wang
bf53ec492d Adjust MV borders for SPLITMV mode
Add limits to avoid MV going out of range.

Change-Id: I8a5deb40bf393488d29f694b5a56804d578e68b5
2010-12-28 13:23:07 -05:00
Yunqing Wang
e463b95b4e Merge "Modify motion estimation for SPLITMV mode" 2010-12-28 08:12:26 -08:00
Yunqing Wang
a5a8d92976 Modify motion estimation for SPLITMV mode
1. Search for block8x16/block16x8 uses block8x8's search results.
2. Check block4x4 only if block8x8 is chosen. (This hurts quality,
   which will be improved in another check-in.)
3. In block4x4 search, the previous block's result is used as
   MV predictor for next block.

This change improves performance.

Change-Id: I9dc089007ca08129fb6c11fe7692777ebb8647b0
2010-12-28 10:34:42 -05:00
Yaowu Xu
95dbe9ccfd Merge "adjusted sad_per_bit to correlate with quantizer" 2010-12-26 13:45:37 -08:00
Yaowu Xu
0f5264b584 adjusted sad_per_bit to correlate with quantizer
Re-calibrated sad_per_bit16 and sad_per_bit4 tables to linearly
correlated to quantizer values, these two variables are used in
motion search for costing motion vectors. This change has an small
positive effect on compression.

Change-Id: Ic9b5ea6fb8d5078ef663ba4899db019cc51f4166
2010-12-23 22:59:38 -08:00
James Berry
74e8446e58 vpxenc stats_close() memleak fix
stats_close() was not freeing memory for
single pass runs.  It now takes in arg_passes
to determine when it should free memory.

Change-Id: I6623b7e30b76f9bf2e16008490f9b20484d03f31
2010-12-23 14:47:56 -05:00
Johann
8c4552fb36 Merge "improve integer version of filter" 2010-12-23 06:14:28 -08:00
Johann
d3c7365b46 Merge "temporal filter naming changes" 2010-12-23 06:14:20 -08:00
Johann
e2de094c99 Merge "abstract apply_temporal_filter" 2010-12-23 06:14:07 -08:00
John Koleszar
bd9b383db2 Merge "make yasm generate cv8 debug data on win32" 2010-12-22 11:11:08 -08:00
John Koleszar
30830d5a7c make yasm generate cv8 debug data on win32
Native Windows targets should use CV8 format debugging symbols, not
DWARF.

Change-Id: I9489163fcd9d749b72f6c70ecbce67a6f0790802
2010-12-22 12:53:45 -05:00
Johann
20b855c33e improve integer version of filter
the lookup table is based on floating point calculations (see source)

by moving the *3 before the downshift and adding the rounding bit, the
delta (LUT - integer) goes from:
______________________________________
__ 1__ 1______________________________
__ 1__ 1______________________________
____ 1______ 1________________________
____ 1 2__ 2 1________________________
______ 1 1 2__ 2__ 2__ 2 1 1__________
________ 1 1 2 2__ 1 2 3 1 2__ 2__ 2__
to:
__-1__-1______________________________
______________________________________
____-1______-1________________________
______________________________________
________-1______________-1____________
______________________________________

it's important to be able to use the integer version because the LUT
more or less precludes SIMD optimizations

Change-Id: I45a81127dc7b72a06fba951649135d9d918386c0
2010-12-22 11:33:59 -05:00
Johann
4b6219cb33 temporal filter naming changes
be more consistant with the naming pattern, especially wrt rtcd

Change-Id: I3df50686a09f1dab0a9620b5adbb8a1577b40f2f
2010-12-22 11:32:15 -05:00
Johann
092b5bef37 abstract apply_temporal_filter
allow for optimized versions of apply_temporal_filter
(now vp8_apply_temporal_filter_c)

the function was previously declared as static and appears to have been
inlined. with this change, that's no longer possible. performance takes
a small hit.

the declaration for vp8_cx_temp_filter_c was moved to onyx_if.c because
of a circular dependency. for rtcd, temporal_filter.h holds the
definition for the rtcd table, so it needs to be included by onyx_int.h.
however, onyx_int.h holds the definition for VP8_COMP which is needed
for the function prototype. blah.

Change-Id: I499c055fdc652ac4659c21c5a55fe10ceb7e95e3
2010-12-22 11:31:54 -05:00
Jim Bankoski
6cb708d501 Merge "Add psnr/ssim tuning option" 2010-12-20 09:32:13 -08:00
John Koleszar
c49f49b113 propagate user private data on decode
The pointer passed in the user_priv argument to vpx_codec_decode()
should be propagated through to the corresponding output frame and
made available in the image's user_priv member. Fixes issue #252

Change-Id: I182746a6882c8549fb146b4a4fdb64f1789eb750
2010-12-17 11:34:02 -05:00
John Koleszar
fc6ce744a6 Merge "Inform caller of decoder about updated references" 2010-12-17 07:08:21 -08:00
John Koleszar
b0da9b399d Add psnr/ssim tuning option
Add a new encoder control, VP8E_SET_TUNING, to allow the application
to inform the encoder that the material will benefit from certain
tuning. Expose this control as the --tune option to vpxenc. The args
helper is expanded to support enumerated arguments by name or value.

Two tunings are provided by this patch, PSNR (default) and SSIM.
Activity masking is made dependent on setting --tune=ssim, as the
current implementation hurts speed (10%) and PSNR (2.7% avg,
10% peak) too much for it to be a default yet.

Change-Id: I110d969381c4805347ff5a0ffaf1a14ca1965257
2010-12-17 10:01:05 -05:00
Henrik Lundin
2a87491fb0 Inform caller of decoder about updated references
Inform the caller of the decoder if a decoded frame updated last,
golden, or altref frames, required for realtime communication
proposed in document VP8 RTP payload format.

Added a new vpx_codec_control called VP8D_GET_LAST_REF_UPDATES, to be
called after vpx_codec_decode. The control will indicate which of the
reference frames that were updated by setting the 3 LSBs in the input
int (pointer).

Change-Id: Iac9db60dac414356c7ffa0b0fede88cb91e11bd7
2010-12-17 14:43:13 +01:00
Scott LaVarnway
64baa8df2e Changed segmentation check order
In SPLITMV, the 8x8 segment will be checked first.  If the 8x8 rd
is better than the best, we check the other segments.  Otherwise
bail.  Adjustments to the thresh_mult were necessary to make
up for the initial quality loss.
The performance improved by 20% (average) for good quality,
speed 0 and speed 1, while the overall quality remained the same.

Change-Id: I717aef401323c8a254fba3e9777d2a316c774cc3
2010-12-16 17:01:27 -05:00
Scott LaVarnway
81cdeb7117 Adjusted breakout RD for SPLITMV
vp8_rd_pick_best_mbsegmentation looks at y only.  The new
breakout does not include the frame cost, the prob_skip_false
cost, or the uv rate.  Performance improved by a few percent
and the quality remained the same.

Change-Id: I94ff013998ac51e8ecce7130870f7b6600758e15
2010-12-16 09:38:02 -05:00
Yunqing Wang
4fbd0227f5 Merge "Fix a bug in motion search code(2)" 2010-12-15 08:10:34 -08:00
Yunqing Wang
08706a3ea7 Fix a bug in motion search code(2)
This fix added MV range checks for NEWMV mode as suggested by Jim.
To reduce unnecessary MV range checks, I tried Yaowu's suggestion.
Update UMV borders in NEWMV mode to also cover MV range check.
Also, in this way, every MV that is valid gets checked in diamond
search function.

Change-Id: I95a89ce0daf6f178c454448f13d4249f19b30f3a
2010-12-14 17:39:25 -05:00
Yaowu Xu
3ac73173a4 Merge "fix a bug that "optimize" flag is not set for sub-threads" 2010-12-14 13:32:04 -08:00
Yunqing Wang
23aa13d92c Merge "Fix a bug in motion search code" 2010-12-14 13:25:34 -08:00
Yunqing Wang
7fb0f86863 Fix a bug in motion search code
The MV's range is 256. Since the new motion search uses a different
starting MV than the center ref MV, a MV range checking needs to
be done to avoid corruption.

Change-Id: I8ae0721d1bd203639e13891e2e54a2e87276f306
2010-12-14 13:59:38 -05:00
Yaowu Xu
64f3d91579 fix a bug that "optimize" flag is not set for sub-threads
The flag for quantization optimization was not properly propagated to
mb row encoding threads.

Change-Id: Ic561599c35acd94cd5698c9b314bccd596ac2deb
2010-12-14 10:12:21 -08:00
Johann
825adc464f shrink TOKENEXTRA and vp8_extra_bit_struct
Per John's previous change, shrink TOKENEXTRA from 20 to 8 bytes
original: b7b1e6fb
reverted: 41f4458a

Also drop unused field from vp8_extra_bit_struct

Update ARM ASM to deal with this change. In particular, Extra is signed
and needs to be sign-extended when loaded.

Change-Id: Ibd0ddc058432bc7bb09222d6ce4ef77e93a30b41
2010-12-14 10:32:50 -05:00
John Koleszar
41f4458a03 Revert "Reduce size of TOKENEXTRA struct"
This reverts commit b7b1e6fb55. Previous
fix is incomplete, breaks ARM. Itchy submit finger.

Change-Id: I939dc0d3bf4173cf951c1d152338ab6ea2184bb9
2010-12-13 17:12:51 -05:00
John Koleszar
3809d7bbd9 Merge "remove unused temporal preproc code" 2010-12-13 13:57:59 -08:00
John Koleszar
398aa81849 Merge "Reduce size of TOKENEXTRA struct" 2010-12-13 13:57:55 -08:00
John Koleszar
b1aa54ab26 remove unused temporal preproc code
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.

Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
2010-12-13 16:47:59 -05:00
John Koleszar
b7b1e6fb55 Reduce size of TOKENEXTRA struct
Change the size of structure elements to reduce memory utilization.
Removed the 'section' member entirely, as it is set but never read.

Change-Id: Iad043830392fb4168cb3cd6075fb0eb70c7f691c
2010-12-13 16:37:37 -05:00
James Berry
136bd2455e fixed vpxenc bug where ivf files would be read incorrectly
read_frame would incorrectly insert detect->buf into img
for ivf files.  detect->position now set to 4 if input file is
detected to be ivf in file_is_ivf to keep this from occuring.

Change-Id: I5e235dd3033985bc62707a35c13af5984620208e
2010-12-13 14:40:18 -05:00
Yaowu Xu
97a86c5b13 fix a bug in multithreaded encoding with active_map enabled
Added the initialization of the pointer to active map. Also added the
same logic for cyclic refresh in mbrow encoding threads.

Change-Id: Ic48d0849dc706b27fba72d07dcc498075725663d
2010-12-10 10:48:30 -08:00
Fritz Koenig
0ced701487 Merge "vp8 fast quantizer sse2 optimizations for eob." 2010-12-10 09:25:04 -08:00
Fritz Koenig
e0cf330cde vp8 fast quantizer sse2 optimizations for eob.
Changed the end of block computation to use pmaxw.  Removed
additional pushing and popping of registers that was not needed.

Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
2010-12-09 15:00:30 -08:00
John Koleszar
cb9698951c fix uninitialized read in encode breakout
Change I3430820 performed an uninitialized read when
encode_breakout == 0, since the sum and sse wouldn't be set:

   if(x->encode_breakout)
       VARIANCE_INVOKE(..., get16x16var)(..., &sum, &sse);
   if (cpi->active_map_enabled && x->active_ptr[0] == 0) {
       ...
   } else if (sse < x->encode_breakout)

Change-Id: I915eb76d1227b4b6d1137a0dedf2c143860098a2
2010-12-09 16:05:26 -05:00
Paul Wilkins
c63fc881e1 Correct q_low and q_high limits for the recode loop
Corrected the initial Q range limits for the recode loop
to reflect the current allowed range for the frame.

In experimental work on constrained quality this bug was
causing unnecessary recodes.

Change-Id: I7e256fbfa681293b0223fe21ec329933d76c229f
2010-12-09 15:02:04 +00:00
Yaowu Xu
160f3c7e9e Merge "vp8e - static threshold play" 2010-12-08 13:08:04 -08:00
Yaowu Xu
d88da98614 Merge "vp8e - remove unnecessary variance calc" 2010-12-08 09:19:22 -08:00
Jim Bankoski
718c19711a vp8e - static threshold play
Realized no need for new assembly code sum is already
calculated.

Change-Id: Ie2d94feb4b7c1f77c5359bca29b66228e41638c9
2010-12-07 16:07:23 -05:00
Scott LaVarnway
f661fa1f24 Merge "vp8_rd_pick_best_mbsegmentation code restructure" 2010-12-07 07:53:12 -08:00
Yaowu Xu
062980cc48 Merge "adjust RDMULT for UV plane in quantization RDO" 2010-12-06 22:04:45 -08:00
Yaowu Xu
7c03a1c308 adjust RDMULT for UV plane in quantization RDO
This patch adds a weighting factor on RDMULT for UV blocks. The change
has an overall gain about 0.5% based on ssim, between 0.1 and 0.2% by
psnr numbers.

Change-Id: I97781b077ce3bb7e34241b03268491917e8d1d72
2010-12-06 20:53:59 -08:00
Yunqing Wang
9520f4b3cc Fix a memory leak problem in encoder
Deallocating the buffers before re-allocating them.

The fix passed James Berry's test program for memory
leak check.

Change-Id: I18c3cf665412c0e313a523e3d435106c03ca438d
2010-12-06 17:21:37 -05:00
Scott LaVarnway
2fa5d5a26d vp8_rd_pick_best_mbsegmentation code restructure
Moved the code from the segmentation loop into a function
which is now called for each segment. This will allow us
to change the segment order checking more easily.

Change-Id: I9510d26f0acae5a73043fcca8f1984b121d3e052
2010-12-06 16:42:52 -05:00
Scott LaVarnway
d283d9bb30 Merge "Improve MV prediction accuracy to achieve performance gain" 2010-12-06 09:41:09 -08:00
Patrik Westin
8534071de0 Fix for manual Golden frame frequency
When auto_golden wasn't set it forced all frames to be a golden
frame. Now the manual configured frequency is adhered to.

Change-Id: I360acac9bc487db0d9c4d4da6ee41f70c227c539
2010-12-06 09:53:41 -05:00
Paul Wilkins
ccb0348473 Merge "Change to inter_minq table." 2010-12-04 02:06:33 -08:00
Paul Wilkins
cec6a596b5 Change to inter_minq table.
The inter_minq table controls the range of quantizers available
for a particular frame in two pass relative to a max Q value.

The changes reduces the range somewhat. The effect of this
was a small increase (0.3% average) in psnr for the test set
but it should also help encode speed somewhat for higher
quality modes as it will reduce the number of iterations in the
recode loop.

The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the target
bit rate in VBR mode and this risk will be more pronounced for short
clips.

The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the
target bit rate in VBR mode and this risk will be more
pronounced for short clips.

Change-Id: I84465567d49ae767c6c73ff2a2aac30c895adb52
2010-12-04 10:04:12 +00:00
Yunqing Wang
c3bbb29164 Improve MV prediction accuracy to achieve performance gain
Add vp8_mv_pred() to better predict starting MV for NEWMV
mode in vp8_rd_pick_inter_mode(). Set different search
ranges according to MV prediction accuracy, which improves
encoder performance without hurting the quality. Also,
as Yaowu suggested, using diamond search result as full
search starting point and therefore adjusting(reducing)
full search range helps the performance.

Change-Id: Ie4a3c8df87e697c1f4f6e2ddb693766bba1b77b6
2010-12-03 15:23:35 -05:00
John Koleszar
5e76dfcc70 Merge 'Add simple version of activity masking.'
Merge commit 'refs/changes/79/779/2' of
    https://review.webmproject.org/p/libvpx

Conflicts:
	vp8/encoder/encodeintra.c
	vp8/encoder/encodemb.c

Change-Id: Id607063fabe92d99eeb3c380e8ca670b01bfb3ef
2010-12-03 13:30:50 -05:00
Fritz Koenig
9c8ad79fdc Set refresh_alt_ref_frame on keyframe encode.
On a keyframe alt ref and golden are refreshed.  The flag was
not being set and so on the frame after a keyframe, motion
search would occur on the alt ref frame.  This is not necessary
because the alt ref frame identical to the last frame in this
scenario.

Handle corner case where a forward alt-ref frame is put
directly after a keyframe.

Change-Id: I9be4cf290d694f8cf2f9a31852014b5ccf1504d3
2010-12-01 12:48:22 -08:00
Jim Bankoski
3430820bbe vp8e - remove unnecessary variance calc
only do the variance calculation if necessary
( eg needed for breakout test)
2010-11-27 14:02:59 -05:00
Pascal Massimino
fd9f9dc054 allow dimensions as low as 1 pixel
remove warning comment in vpxenc.c: in case of 1x1 picture,
detect_bytes will be equal to '3' and we'll fall back to
RAW_TYPE.
fix read_frame() by tracking the pre-read buffer length
in the struct detect

Change-Id: If1ed86ee5260dcdbc8f9d10da6cbb84a4cc2f151
2010-11-24 16:44:33 -08:00
John Koleszar
19e32ac7c7 Merge "vpxdec: fix use of uninitialized memory for raw files" 2010-11-23 12:39:03 -08:00
John Koleszar
78cbe51bc3 Merge changes I3aed713e,I9ef7f56e,Ic18c60df
* changes:
  vp8_set_maps: remove hard-coded width/height
  vp8mt_alloc_temp_buffers: make prototype return void
  Disable compile warning for ERROR macro
2010-11-23 12:38:20 -08:00
John Koleszar
19255b8fe0 vpxdec: fix use of uninitialized memory for raw files
The sz member of the vpx_codec_stream_info_t structure must be
initialized when passed to vpx_codec_peek_stream_info().

Change-Id: I2d13d287d9639262b932cf44671a595fdf3c38ef
2010-11-23 13:49:40 -05:00
Paul Wilkins
ad6150f769 Recalibration of bits per MB tables
The baseline bits per MB prediction tables have been
re calibrated based on the assumption that bits per mb
is inversely proportional to the quantizer level.

Change-Id: Ibd355c7acac4b8053dda1baf1032fe35f11da7f7
2010-11-22 13:17:35 +00:00
Paul Wilkins
1753f0d208 Merge "Added extra two pass stats gathering." 2010-11-22 04:11:20 -08:00
Paul Wilkins
70b885a0e8 Added extra two pass stats gathering.
Added code to record spend so far against planed budget.

Change-Id: I5a3335346fa1771b2b1219df9f6127f9993d2594
2010-11-19 14:12:33 -05:00
Pascal Massimino
ed5ab7fa49 remove warning
was having: "vp8/encoder/onyx_if.c:5365: warning: comparison of unsigned expression >= 0 is always true"
2010-11-17 16:50:02 -08:00
Scott LaVarnway
9a6740af80 Merge "Removed unnecessary checks." 2010-11-17 11:28:22 -08:00
Scott LaVarnway
f7670acc68 Removed unnecessary checks.
macro_block_yrd and vp8_rdcost_mby are not called for SPLITMV.

Change-Id: I2224d3c8725df526d48426447482768d543752f1
2010-11-17 14:25:48 -05:00
Paul Wilkins
f874391e02 Replaced recode loop test with a function call
Replaced existing code to decide if a frame recode is required
with a function call. This is to simplify addition of extra clauses
that may be needed for the planned constrained quality mode.

Also fixed a bug where by alt ref not considered in the test.

Change-Id: I3d40bb21abe3e19f8456761e6849deb171738b60
2010-11-17 15:12:04 +00:00
John Koleszar
7ee516d2b3 vp8_set_maps: remove hard-coded width/height
The example for disabling the active map used a hard-coded 320x240
resolution, rather than using what was passed on the command line.

Fixes #218

Change-Id: I3aed713e8aa7fcbf18dfbffd57f142b5cd9ee492
2010-11-17 09:24:05 -05:00
John Koleszar
8d94796cad vp8mt_alloc_temp_buffers: make prototype return void
This function was never called in a context expecting a return value,
the return value was always a constant, and the !CONFIG_MULTITHREAD
path didn't have a return statement, which caused a compiler warning.
This patch changes the function to return void instead.

Fixes issue #231

Change-Id: I9ef7f56e54418b7265026c54fc4ed5660c1418d1
2010-11-17 09:13:57 -05:00
John Koleszar
79e2b1f39b Disable compile warning for ERROR macro
The ERROR macro collides wiith the MS SDK on Windows. Since we're not
making any win32 calls in this function, just #undef it first to take
ownership.

Change-Id: Ic18c60dfa3a33c52e6c49d3f4f8d3e7e3ac3341d
2010-11-17 09:08:51 -05:00
Fritz Koenig
99d02c0f9f Merge "Comments for alt ref flags." 2010-11-16 16:11:39 -08:00
Fritz Koenig
69ee697fef Comments for alt ref flags.
Clarify what the alt ref flags do when encoding.

Change-Id: I71f78e0f42edae633fb91840f29dfbe64362c44c
2010-11-16 15:16:24 -08:00
Yaowu Xu
4fedfa75f8 Merge "correct errors in token alphabet descriptions" 2010-11-16 14:06:44 -08:00
tomfinegan
faaa57b945 Add x86_64-darwin10-gcc target.
Adds native build configuration for Snow Leopard.  Useful when
users configure without arguments on OSX 10.6.

Change-Id: I0bd63912a25bbfb9d4c8d58a781d0f390792429c
2010-11-16 14:52:05 -05:00
Yaowu Xu
d49da085c0 correct errors in token alphabet descriptions
There were a few errors in the comment section that describe VP8 token
alphabet table.

Change-Id: Ie6728a0e08bc3798893221b60408d5b201064bdc
2010-11-16 10:51:43 -08:00
Fritz Koenig
e180255375 Remove stack shadowing for x86-x64 for SAD functions.
x86-64 passes arguments in registers.  There is no need to push
them to the stack before using them.

This fixes 15acc84f10 where ebx
was not getting preserved on x86.

Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
2010-11-15 10:56:02 -08:00
Paul Wilkins
f4709d2895 Merge "Bad cost tables used in ARNR filtering." 2010-11-15 09:55:35 -08:00
Paul Wilkins
373f5c3144 Bad cost tables used in ARNR filtering.
The use of incorrect mv costing tables in the ARNR sub-pel
filtering code led to corruption of the altref buffer in some cases,
particularly at low data rates.

The average gain from this fix is about 0.3% but there are a few
extreme cases where nasty and visible artifacts manifested and
for these few data points the improvement is > 10%.

PGW and AWG

Change-Id: I95cc02b196a433e71d0d2bd2b933fe68ed31e796
2010-11-15 17:47:12 +00:00
Yaowu Xu
73189f21b3 Merge "make rdmult adaptive for intra in quantizer RDO" 2010-11-15 09:22:45 -08:00
Frank Galligan
8c2dfde3ed Fixed bug first cluster timecode of webm file is wrong.
When the first pts equaled 0 ivfenc was incorrectly increasing the
pts by 1. I changed the pts and last pts to be signed. I also set
the default value of last pts to -1.

Change-Id: I30bcec5af9b16d93fa9e3abbea7764b133e9cd73
2010-11-12 11:48:17 -05:00
Yaowu Xu
ef2f27f10e make rdmult adaptive for intra in quantizer RDO
This intends to correct the tendency that VP8 aggressively favors rate
on intra coded frames. Experiments tested different numbers in [0, 1]
and found 9/16 overall provided about 2-4% gains for all-intra coded
clips based on vpx-ssim metric. The impact on regular encoded clips
is much smaller but positive overall. Overall impact on psnr is also
positive even though very small.

Change-Id: If808553aaaa87fdd44691f9787820ac9856d9f8a
2010-11-11 11:33:35 -08:00
John Koleszar
0a49747b01 quantizer: fix assertion in fast quantizer path
The fast quantizer assembly code has not been updated to match the new
exact quantizer, which was made the default in commit 6adbe09.
Specifically, they are not aware of the potential for the coefficient
to be scaled, which results in the quantized result exceeding the range
of the DCT. This patch restores the previous behavior of using the
non-shifted coefficients when in the fast quantizer code path, but
unfortunately requires rebuilding the tables when switching between the
two.

Change-Id: I0a33f5b3850335011a06906f49fafed54dda9546
2010-11-11 13:05:20 -05:00
Fritz Koenig
58083cb34d Revert "Remove stack shadowing for x86-64"
This reverts commit 15acc84f10.

Change-Id: Ia640be8cbc134432914849c1750f62575ea084e6
2010-11-11 08:20:02 -08:00
Paul Wilkins
213f7b0907 Merge "Relax rate control for last few frames" 2010-11-11 02:39:20 -08:00
Fritz Koenig
692b10858d configure : Incorrect syntax in configure
Check to see if postproc was enabled when enabling the
postproc visualizer was wrong.

Fix for bug introduced in Change Ia74f357d

Change-Id: I4bee9ad2caee3cfe3bac6972047f6af7c54cad4e
2010-11-10 14:54:59 -08:00
Fritz Koenig
9b1ece2cca Merge "Remove stack shadowing for x86-64" 2010-11-10 14:36:10 -08:00
Fritz Koenig
5f0e0617ba FDCT optimizations.
Fixed up the fdct for mmx and 8x4 sse2 to match them
most recent changes.

Change-Id: Ibee2d6c536fe14dcf75cd6eb1c73f4848a56d719
2010-11-10 14:34:02 -08:00
Fritz Koenig
647df00f30 postproc : Re-work posproc calling to allow more flags.
Debugging in postproc needs more flags to allow for specific
block types to be turned on or off in the visualizations.

Must be enabled with --enable-postproc-visualizer during
configuration time.

Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d
2010-11-10 14:14:46 -08:00
Paul Wilkins
513f8e6814 Relax rate control for last few frames
VBR rate control can become very noisy for the last few frames.
If there are a few bits to spare or a small overshoot then the
target rate and hence quantizer may start to fluctuate wildly.

This patch prevents further adjustment of the active Q limits for
the last few frames.

Patch also removes some redundant variables and makes one small bug fix.

Change-Id: Ic167831bec79acc9f0d7e4698bcc4bb188840c45
2010-11-10 10:09:45 +00:00
Paul Wilkins
6adbe09058 Tuning for the more exact quantizer.
Small changes to the default zero bin and rounding tables.
Though the tables are currently the same for the Y1 and Y2 cases
I have left them as separate tables in case we want to tune this later.

There is now some adjustment of the zbin based on the prediction mode.
Previously this was restricted to an adjustment for gf/arf 0,0 MV.

The exact quantizer now marginal outperforms and is the default.

The overall average gain is about 0.5%

Change-Id: I5e4353f3d5326dde4e86823684b236a1e9ea7f47
2010-11-10 09:52:58 +00:00
John Koleszar
458f4fedd2 Merge "improve average framerate calculation" 2010-11-09 08:52:16 -08:00
John Koleszar
4d1b0d2a2d Merge commit 'fix integer promotion bug in partition size check'
Change-Id: I4081917b46013fa8f4218cade8bd12cb2d013aee
2010-11-05 16:49:32 -04:00
John Koleszar
9fb80f7170 fix integer promotion bug in partition size check
The check '(user_data_end - partition < partition_size)' must be
evaluated as a signed comparison, but because partition_size was
unsigned, the LHS was promoted to unsigned, causing an incorrect
result on 32-bit. Instead, check the upper and lower bounds of
the segment separately.

Change-Id: I6266aba7fd7de084268712a3d2a81424ead7aa06
2010-11-05 14:52:53 -04:00
John Koleszar
f7e187d362 improve average framerate calculation
Change Ice204e86 identified a problem with bitrate undershoot due to
low precision in the timestamps passed to the library. This patch
takes a different approach by calculating the duration of this frame
and passing it to the library, rather than using a fixed duration
and letting the library average it out with higher precision
timestamps. This part of the fix only applies to vpxenc.

This patch also attempts to fix the problem for generic applications
that may have made the same mistake vpxenc did. Instead of
calculating this frame's duration by the difference of this frame's
and the last frame's start time, we use the end times instead. This
allows the framerate calculation to scavenge "unclaimed" time from
the last frame. For instance:

  start |  end  | calculated duration
  ======+=======+====================
    0ms    33ms   33ms
   33ms    66ms   33ms
   66ms    99ms   33ms
  100ms   133ms   34ms

Change-Id: I92be4b3518e0bd530e97f90e69e75330a4c413fc
2010-11-05 08:42:46 -04:00
John Koleszar
5551ef0ef4 Merge "vpxdec: report parse errors from webm_guess_framerate()" 2010-11-04 19:18:53 -07:00
John Koleszar
bd05d9e480 vpxdec: report parse errors from webm_guess_framerate()
If this function fails silently, the nestegg context is destroyed and
future nestegg calls will segfault.

Change-Id: Ie6a0ea284ab9ddfa97b1843ef8030a953937c8cd
2010-11-04 14:56:48 -04:00
Fritz Koenig
507eb4b577 Merge "postproc : Update visualizations." 2010-11-04 11:28:18 -07:00
Fritz Koenig
0e7b60617f postproc : Update visualizations.
Change color reference frame to blend the macro block edge.
This helps with layering of visualizations.

Add block coloring for intra prediction modes.

Change-Id: Icefe0e189e26719cd6937cebd6727efac0b4d278
2010-11-04 10:35:02 -07:00
Yaowu Xu
a5397dbaf1 Increase the resolution of default timebase
The old value 1000 was too low, which caused the effective duration and
frame rate calculation to have an 1% error for typical 30 frame/second
inputs. Symptom of the issue has been that most 2 pass encodings were
undershooting target bit rate by 1% or so for 30 fps input.

Change-Id: Ice204e86f844ceb9ce973456f2b995cc095283cf
2010-11-04 09:26:47 +00:00
John Koleszar
77e6b4504b vpxenc: require width and height for raw streams
Defaulting to 320x240 for raw streams is arbitrary and error-prone.
Instead, require that the width and height be set manually if they
can't be parsed from the input file.

Change-Id: Ic61979857e372eed0779c2677247e894f9fd6160
2010-11-03 13:58:44 -04:00
John Koleszar
4b9dc57260 Merge "fix pipe support on windows" 2010-11-02 17:01:54 -07:00
Fritz Koenig
0a29bd9793 postproc : Fix display of motion vectors.
Split motion vectors were all being treated as 4x4
blocks.  Now correctly handle 16x8, 8x16, 8x8, 4x4
blocks.

Change-Id: Icf345c5e69b5e374e12456877ed7c41213ad88cc
2010-11-02 13:29:13 -07:00
Scott LaVarnway
b8f43aec66 Merge "SSSE3 version of fast quantizer" 2010-11-02 06:27:29 -07:00
John Koleszar
c377bf0eec fix pipe support on windows
STDIO streams are opened in text mode by default on Windows. This patch
changes the stdin/stdout streams to be in binary mode if they are being
used for I/O from the vpxenc or vpxdec tools.

Fixes issue #216. Thanks to mw AT hesotech.de for the fix.

Change-Id: I34525b3ce2a4a031d5a48d36df4667589372225b
2010-11-02 09:14:24 -04:00
Fritz Koenig
90c505f218 Merge "postproc : Added SPLITMV visualization, fix line constrain." 2010-11-01 14:41:41 -07:00
Fritz Koenig
9f61a83bf9 postproc : Added SPLITMV visualization, fix line constrain.
Now draw 16 vectors for SPLITMV mode.

Fixed constrain line to block divide by zero issues.

Blend block was not centering the shaded area correctly.

Change-Id: I1edabd8b4e553aac8d980f7b45c80159e9202434
2010-11-01 13:27:13 -07:00
Scott LaVarnway
ff4a71f4c2 SSSE3 version of fast quantizer
(test clip: tulip)
For good quality mode with speed=1, this gave the encoder
a small (2 - 3%) performance boost.

Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
2010-11-01 16:24:15 -04:00
Scott LaVarnway
20745f8442 Merge "Finding first label" 2010-11-01 08:42:37 -07:00
John Koleszar
0684c647ef cosmetic: remove alt_ref from vpxenc usage message
Undo an automatic transform.

Change-Id: Ie730a6a31b4680b34e54b61691d67c4b3ed2f2aa
2010-10-29 11:07:31 -04:00
Scott LaVarnway
dcee88ea37 Finding first label
Using tables for the label count and label offset.

Change-Id: Iac3d5b292c37341a881be0af282f5cac3b3e01eb
2010-10-29 10:01:04 -04:00
Yunqing Wang
6614563b8f Save XMM registers in asm functions
XMM6/7 are used in these functions, and need to be saved.

Change-Id: I3dfaddaf2a69cd4bf8e8735c7064b17bac5a14e5
2010-10-28 16:59:03 -04:00
Yunqing Wang
f57fc7bcc6 Merge "Fix full-search SAD function crash in Visual Studio" 2010-10-28 13:46:35 -07:00
John Koleszar
9d93dabee0 Merge branch 'aylesbury' 2010-10-28 16:01:03 -04:00
Yunqing Wang
7e3a1e7361 Fix full-search SAD function crash in Visual Studio
Unlike GCC, Visual Studio compiler doesn't allocate SAD output
array 16-byte aligned, which causes crash in visual studio.

Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
2010-10-28 15:26:58 -04:00
Timothy B. Terriberry
c4d7e5e67e Eliminate more warnings.
This eliminates a large set of warnings exposed by the Mozilla build
 system (Use of C++ comments in ISO C90 source, commas at the end of
 enum lists, a couple incomplete initializers, and signed/unsigned
 comparisons).
It also eliminates many (but not all) of the warnings expose by newer
 GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
 without checking the return values).
There are a few spurious warnings left on my system:

../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
 uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
 change between the two if blocks that test it here.

../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
 expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
 expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum, and the C
 standard does not mandate that enums be unsigned, so the checks can't be
 removed.

Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
2010-10-27 18:08:04 -07:00
Fritz Koenig
2b4913eb0d Merge "postproc: Tweaks to line drawing and blending." 2010-10-27 13:20:56 -07:00
Fritz Koenig
a097e18964 postproc: Tweaks to line drawing and blending.
Turned down the blending level to make colored blocks obscure
the video less.
Not blending the entire block to give distinction to macro
block edges.
Added configuration so that macro block blending function can
be optimized.
Change to constrain line as to when dx and dy are computed.
Now draw two lines to form an arrow.

Change-Id: Id3ef0fdeeab2949a6664b2c63e2a3e1a89503f6c
2010-10-27 13:20:03 -07:00
John Koleszar
f26fe7d93b Merge "Output the PSNR for the entire file." 2010-10-27 12:06:23 -07:00
Frank Galligan
3d84da6b8d Output the PSNR for the entire file.
If --psnr option is enabled vpxenc will output PSNR values for the
entire file. Added a \n before final output to make sure the output
is on its own line. Overall and Avg psnr matches the values written
to opsnr.stt file.

Change-Id: I869268b704fe8b0c8389d318cceb6072fea102f8
2010-10-27 14:31:07 -04:00
Yunqing Wang
71ecb5d7d9 Full search SAD function optimization in SSE4.1
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4

(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.

Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
2010-10-27 13:36:31 -04:00
Fritz Koenig
15acc84f10 Remove stack shadowing for x86-64
x86-64 passes most arguments in registers.  There is no need to
push them to the stack before using them.

Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
2010-10-21 10:28:08 -07:00
Timothy B. Terriberry
8d0f7a01e6 Add simple version of activity masking.
This uses MB variance to change the RDO weight for mode decision
 and quantization.
Activity is normalized against the average for the frame, which is
 currently tracked using feed-forward statistics.
This could also be used to adjust the quantizer for the entire
 frame, but that requires more extensive rate control changes.
This does not yet attempt to adapt the quantizer within the frame,
 but the signaling cost means that will likely only be useful at
 very high rates.

Change-Id: I26cd7c755cac3ff33cfe0688b1da50b2b87b9c93
2010-10-12 08:41:03 -04:00
351 changed files with 23843 additions and 32557 deletions

View File

@@ -1,2 +1,5 @@
Adrian Grange <agrange@google.com>
Johann Koenig <johannkoenig@google.com>
Tero Rintaluoma <teror@google.com> <tero.rintaluoma@on2.com>
Tom Finegan <tomfinegan@google.com>
Ralph Giles <giles@xiph.org> <giles@entropywave.com>

21
AUTHORS
View File

@@ -4,29 +4,50 @@
Aaron Watry <awatry@gmail.com>
Adrian Grange <agrange@google.com>
Alex Converse <alex.converse@gmail.com>
Alexis Ballier <aballier@gentoo.org>
Alok Ahuja <waveletcoeff@gmail.com>
Andoni Morales Alastruey <ylatuya@gmail.com>
Andres Mejia <mcitadel@gmail.com>
Aron Rosenberg <arosenberg@logitech.com>
Attila Nagy <attilanagy@google.com>
Fabio Pedretti <fabio.ped@libero.it>
Frank Galligan <fgalligan@google.com>
Fredrik Söderquist <fs@opera.com>
Fritz Koenig <frkoenig@google.com>
Gaute Strokkenes <gaute.strokkenes@broadcom.com>
Giuseppe Scrivano <gscrivano@gnu.org>
Guillermo Ballester Valor <gbvalor@gmail.com>
Henrik Lundin <hlundin@google.com>
James Berry <jamesberry@google.com>
James Zern <jzern@google.com>
Jan Kratochvil <jan.kratochvil@redhat.com>
Jeff Muizelaar <jmuizelaar@mozilla.com>
Jim Bankoski <jimbankoski@google.com>
Johann Koenig <johannkoenig@google.com>
John Koleszar <jkoleszar@google.com>
Joshua Bleecher Snyder <josh@treelinelabs.com>
Justin Clift <justin@salasaga.org>
Justin Lebar <justin.lebar@gmail.com>
Lou Quillio <louquillio@google.com>
Luca Barbato <lu_zero@gentoo.org>
Makoto Kato <makoto.kt@gmail.com>
Martin Ettl <ettl.martin78@googlemail.com>
Michael Kohler <michaelkohler@live.com>
Mike Hommey <mhommey@mozilla.com>
Mikhal Shemer <mikhal@google.com>
Pascal Massimino <pascal.massimino@gmail.com>
Patrik Westin <patrik.westin@gmail.com>
Paul Wilkins <paulwilkins@google.com>
Pavol Rusnak <stick@gk2.sk>
Philip Jägenstedt <philipj@opera.com>
Rafael Ávila de Espíndola <rafael.espindola@gmail.com>
Ralph Giles <giles@xiph.org>
Ronald S. Bultje <rbultje@google.com>
Scott LaVarnway <slavarnway@google.com>
Stefan Holmer <holmer@google.com>
Taekhyun Kim <takim@nvidia.com>
Tero Rintaluoma <teror@google.com>
Thijs Vermeir <thijsvermeir@gmail.com>
Timothy B. Terriberry <tterribe@xiph.org>
Tom Finegan <tomfinegan@google.com>
Yaowu Xu <yaowu@google.com>

189
CHANGELOG
View File

@@ -1,3 +1,192 @@
2011-08-15 v0.9.7-p1 "Cayuga" patch 1
This is an incremental bugfix release against Cayuga. All users of that
release are strongly encouraged to upgrade.
- Fix potential OOB reads (cdae03a)
An unbounded out of bounds read was discovered when the
decoder was requested to perform error concealment (new in
Cayuga) given a frame with corrupt partition sizes.
A bounded out of bounds read was discovered affecting all
versions of libvpx. Given an multipartition input frame that
is truncated between the mode/mv partition and the first
residiual paritition (in the block of partition offsets), up
to 3 extra bytes could have been read from the source buffer.
The code will not take any action regardless of the contents
of these undefined bytes, as the truncated buffer is detected
immediately following the read based on the calculated
starting position of the coefficient partition.
- Fix potential error concealment crash when the very first frame
is missing or corrupt (a609be5)
- Fix significant artifacts in error concealment (a4c2211, 99d870a)
- Revert 1-pass CBR rate control changes (e961317)
Further testing showed this change produced undesirable visual
artifacts, rolling back for now.
2011-08-02 v0.9.7 "Cayuga"
Our third named release, focused on a faster, higher quality, encoder.
- Upgrading:
This release is backwards compatible with Aylesbury (v0.9.5) and
Bali (v0.9.6). Users of older releases should refer to the Upgrading
notes in this document for that release.
- Enhancements:
Stereo 3D format support for vpxenc
Runtime detection of available processor cores.
Allow specifying --end-usage by enum name
vpxdec: test for frame corruption
vpxenc: add quantizer histogram display
vpxenc: add rate histogram display
Set VPX_FRAME_IS_DROPPABLE
update configure for ios sdk 4.3
Avoid text relocations in ARM vp8 decoder
Generate a vpx.pc file for pkg-config.
New ways of passing encoded data between encoder and decoder.
- Speed:
This release includes across-the-board speed improvements to the
encoder. On x86, these measure at approximately 11.5% in Best mode,
21.5% in Good mode (speed 0), and 22.5% in Realtime mode (speed 6).
On ARM Cortex A9 with Neon extensions, real-time encoding of video
telephony content is 35% faster than Bali on single core and 48%
faster on multi-core. On the NVidia Tegra2 platform, real time
encoding is 40% faster than Bali.
Decoder speed was not a priority for this release, but improved
approximately 8.4% on x86.
Reduce motion vector search on alt-ref frame.
Encoder loopfilter running in its own thread
Reworked loopfilter to precalculate more parameters
SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
Removed redundant checks
Reduced structure sizes
utilize preload in ARMv6 MC/LPF/Copy routines
ARM optimized quantization, dfct, variance, subtract
Increase chrow row alignment to 16 bytes.
disable trellis optimization for first pass
Write SSSE3 sub-pixel filter function
Improve SSE2 half-pixel filter funtions
Add vp8_sub_pixel_variance16x8_ssse3 function
Reduce unnecessary distortion computation
Use diamond search to replace full search
Preload reference area in sub-pixel motion search (real-time mode)
- Quality:
This release focused primarily on one-pass use cases, including
video conferencing. Low latency data rate control was significantly
improved, improving streamability over bandwidth constrained links.
Added support for error concealment, allowing frames to maintain
visual quality in the presence of substantial packet loss.
Add rc_max_intra_bitrate_pct control
Limit size of initial keyframe in one-pass.
Improve framerate adaptation
Improved 1-pass CBR rate control
Improved KF insertion after fades to still.
Improved key frame detection.
Improved activity masking (lower PSNR impact for same SSIM boost)
Improved interaction between GF and ARFs
Adding error-concealment to the decoder.
Adding support for independent partitions
Adjusted rate-distortion constants
- Bug Fixes:
Removed firstpass motion map
Fix parallel make install
Fix multithreaded encoding for 1 MB wide frame
Fixed iwalsh_neon build problems with RVDS4.1
Fix semaphore emulation, spin-wait intrinsics on Windows
Fix build with xcode4 and simplify GLOBAL.
Mark ARM asm objects as allowing a non-executable stack.
Fix vpxenc encoding incorrect webm file header on big endian
2011-03-07 v0.9.6 "Bali"
Our second named release, focused on a faster, higher quality, encoder.
- Upgrading:
This release is backwards compatible with Aylesbury (v0.9.5). Users
of older releases should refer to the Upgrading notes in this
document for that release.
- Enhancements:
vpxenc --psnr shows a summary when encode completes
--tune=ssim option to enable activity masking
improved postproc visualizations for development
updated support for Apple iOS to SDK 4.2
query decoder to determine which reference frames were updated
implemented error tracking in the decoder
fix pipe support on windows
- Speed:
Primary focus was on good quality mode, speed 0. Average improvement
on x86 about 40%, up to 100% on user-generated content at that speed.
Best quality mode speed improved 35%, and realtime speed 10-20%. This
release also saw significant improvement in realtime encoding speed
on ARM platforms.
Improved encoder threading
Dont pick encoder filter level when loopfilter is disabled.
Avoid double copying of key frames into alt and golden buffer
FDCT optimizations.
x86 sse2 temporal filter
SSSE3 version of fast quantizer
vp8_rd_pick_best_mbsegmentation code restructure
Adjusted breakout RD for SPLITMV
Changed segmentation check order
Improved rd_pick_intra4x4block
Adds armv6 optimized variance calculation
ARMv6 optimized sad16x16
ARMv6 optimized half pixel variance calculations
Full search SAD function optimization in SSE4.1
Improve MV prediction accuracy to achieve performance gain
Improve MV prediction in vp8_pick_inter_mode() for speed>3
- Quality:
Best quality mode improved PSNR 6.3%, and SSIM 6.1%. This release
also includes support for "activity masking," which greatly improves
SSIM at the expense of PSNR. For now, this feature is available with
the --tune=ssim option. Further experimentation in this area
is ongoing. This release also introduces a new rate control mode
called "CQ," which changes the allocation of bits within a clip to
the sections where they will have the most visual impact.
Tuning for the more exact quantizer.
Relax rate control for last few frames
CQ Mode
Limit key frame quantizer for forced key frames.
KF/GF Pulsing
Add simple version of activity masking.
make rdmult adaptive for intra in quantizer RDO
cap the best quantizer for 2nd order DC
change the threshold of DC check for encode breakout
- Bug Fixes:
Fix crash on Sparc Solaris.
Fix counter of fixed keyframe distance
ARNR filter pointer update bug fix
Fixed use of motion percentage in KF/GF group calc
Changed condition for using RD in Intra Mode
Fix encoder real-time only configuration.
Fix ARM encoder crash with multiple token partitions
Fixed bug first cluster timecode of webm file is wrong.
Fixed various encoder bugs with odd-sized images
vp8e_get_preview fixed when spatial resampling enabled
quantizer: fix assertion in fast quantizer path
Allocate source buffers to be multiples of 16
Fix for manual Golden frame frequency
Fix drastic undershoot in long form content
2010-10-28 v0.9.5 "Aylesbury"
Our first named release, focused on a faster decoder, and a better encoder.

4
README
View File

@@ -45,18 +45,14 @@ COMPILING THE APPLICATIONS/LIBRARIES:
armv5te-linux-rvct
armv5te-linux-gcc
armv5te-symbian-gcc
armv5te-wince-vs8
armv6-darwin-gcc
armv6-linux-rvct
armv6-linux-gcc
armv6-symbian-gcc
armv6-wince-vs8
iwmmxt-linux-rvct
iwmmxt-linux-gcc
iwmmxt-wince-vs8
iwmmxt2-linux-rvct
iwmmxt2-linux-gcc
iwmmxt2-wince-vs8
armv7-linux-rvct
armv7-linux-gcc
mips32-linux-gcc

45
args.c
View File

@@ -135,6 +135,17 @@ void arg_show_usage(FILE *fp, const struct arg_def *const *defs)
def->long_name, long_val);
fprintf(fp, " %-37s\t%s\n", option_text, def->desc);
if(def->enums)
{
const struct arg_enum_list *listptr;
fprintf(fp, " %-37s\t ", "");
for(listptr = def->enums; listptr->name; listptr++)
fprintf(fp, "%s%s", listptr->name,
listptr[1].name ? ", " : "\n");
}
}
}
@@ -218,3 +229,37 @@ struct vpx_rational arg_parse_rational(const struct arg *arg)
return rat;
}
int arg_parse_enum(const struct arg *arg)
{
const struct arg_enum_list *listptr;
long int rawval;
char *endptr;
/* First see if the value can be parsed as a raw value */
rawval = strtol(arg->val, &endptr, 10);
if (arg->val[0] != '\0' && endptr[0] == '\0')
{
/* Got a raw value, make sure it's valid */
for(listptr = arg->def->enums; listptr->name; listptr++)
if(listptr->val == rawval)
return rawval;
}
/* Next see if it can be parsed as a string */
for(listptr = arg->def->enums; listptr->name; listptr++)
if(!strcmp(arg->val, listptr->name))
return listptr->val;
die("Option %s: Invalid value '%s'\n", arg->name, arg->val);
return 0;
}
int arg_parse_enum_or_int(const struct arg *arg)
{
if(arg->def->enums)
return arg_parse_enum(arg);
return arg_parse_int(arg);
}

12
args.h
View File

@@ -22,14 +22,23 @@ struct arg
const struct arg_def *def;
};
struct arg_enum_list
{
const char *name;
int val;
};
#define ARG_ENUM_LIST_END {0}
typedef struct arg_def
{
const char *short_name;
const char *long_name;
int has_val;
const char *desc;
const struct arg_enum_list *enums;
} arg_def_t;
#define ARG_DEF(s,l,v,d) {s,l,v,d}
#define ARG_DEF(s,l,v,d) {s,l,v,d, NULL}
#define ARG_DEF_ENUM(s,l,v,d,e) {s,l,v,d,e}
#define ARG_DEF_LIST_END {0}
struct arg arg_init(char **argv);
@@ -41,4 +50,5 @@ char **argv_dup(int argc, const char **argv);
unsigned int arg_parse_uint(const struct arg *arg);
int arg_parse_int(const struct arg *arg);
struct vpx_rational arg_parse_rational(const struct arg *arg);
int arg_parse_enum_or_int(const struct arg *arg);
#endif

View File

@@ -1,20 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<VisualStudioToolFile
Name="armasm"
Version="8.00"
>
<Rules>
<CustomBuildRule
Name="ARMASM"
DisplayName="Armasm Assembler"
CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -ARCH 5&#x0D;&#x0A;"
Outputs="$(IntDir)\$(InputName).obj"
FileExtensions="*.asm"
ExecutionDescription="Assembling $(InputName).asm"
ShowOnlyRuleProperties="false"
>
<Properties>
</Properties>
</CustomBuildRule>
</Rules>
</VisualStudioToolFile>

View File

@@ -1,20 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<VisualStudioToolFile
Name="armasm"
Version="8.00"
>
<Rules>
<CustomBuildRule
Name="ARMASM"
DisplayName="Armasm Assembler"
CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -ARCH 6&#x0D;&#x0A;"
Outputs="$(IntDir)\$(InputName).obj"
FileExtensions="*.asm"
ExecutionDescription="Assembling $(InputName).asm"
ShowOnlyRuleProperties="false"
>
<Properties>
</Properties>
</CustomBuildRule>
</Rules>
</VisualStudioToolFile>

View File

@@ -1,20 +0,0 @@
<?xml version="1.0" encoding="utf-8"?>
<VisualStudioToolFile
Name="armasm"
Version="8.00"
>
<Rules>
<CustomBuildRule
Name="ARMASM"
DisplayName="Armasm Assembler"
CommandLine="armasm -o &quot;$(IntDir)\$(InputName).obj&quot; $(InputPath) -32 -cpu XSCALE&#x0D;&#x0A;"
Outputs="$(IntDir)\$(InputName).obj"
FileExtensions="*.asm"
ExecutionDescription="Assembling $(InputName).asm"
ShowOnlyRuleProperties="false"
>
<Properties>
</Properties>
</CustomBuildRule>
</Rules>
</VisualStudioToolFile>

View File

@@ -1,13 +0,0 @@
@echo off
REM Copyright (c) 2010 The WebM project authors. All Rights Reserved.
REM
REM Use of this source code is governed by a BSD-style license
REM that can be found in the LICENSE file in the root of the source
REM tree. An additional intellectual property rights grant can be found
REM in the file PATENTS. All contributing project authors may
REM be found in the AUTHORS file in the root of the source tree.
echo on
cl /I ".\\" /I "..\vp6_decoder_sdk" /I "..\vp6_decoder_sdk\vpx_ports" /D "NDEBUG" /D "_WIN32_WCE=0x420" /D "UNDER_CE" /D "WIN32_PLATFORM_PSPC" /D "WINCE" /D "_LIB" /D "ARM" /D "_ARM_" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MT /GS- /fp:fast /GR- /Fo"Pocket_PC_2003__ARMV4_\%1/" /Fd"Pocket_PC_2003__ARMV4_\%1/vc80.pdb" /W3 /nologo /c /TC ..\vp6_decoder_sdk\vp6_decoder\algo\common\arm\dec_asm_offsets_arm.c
obj_int_extract.exe rvds "Pocket_PC_2003__ARMV4_\%1/dec_asm_offsets_arm.obj"

View File

@@ -1,88 +0,0 @@
Microsoft Visual Studio Solution File, Format Version 9.00
# Visual Studio 2005
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "example", "example.vcproj", "{BA5FE66F-38DD-E034-F542-B1578C5FB950}"
ProjectSection(ProjectDependencies) = postProject
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74} = {DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}
{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
EndProjectSection
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "obj_int_extract", "obj_int_extract.vcproj", "{E1360C65-D375-4335-8057-7ED99CC3F9B2}"
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "vpx", "vpx.vcproj", "{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}"
ProjectSection(ProjectDependencies) = postProject
{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
EndProjectSection
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "xma", "xma.vcproj", "{A955FC4A-73F1-44F7-135E-30D84D32F022}"
ProjectSection(ProjectDependencies) = postProject
{E1360C65-D375-4335-8057-7ED99CC3F9B2} = {E1360C65-D375-4335-8057-7ED99CC3F9B2}
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74} = {DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}
EndProjectSection
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Mixed Platforms = Debug|Mixed Platforms
Debug|Pocket PC 2003 (ARMV4) = Debug|Pocket PC 2003 (ARMV4)
Debug|Win32 = Debug|Win32
Release|Mixed Platforms = Release|Mixed Platforms
Release|Pocket PC 2003 (ARMV4) = Release|Pocket PC 2003 (ARMV4)
Release|Win32 = Release|Win32
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{BA5FE66F-38DD-E034-F542-B1578C5FB950}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Mixed Platforms.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Mixed Platforms.Build.0 = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Win32.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Debug|Win32.Build.0 = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Mixed Platforms.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Mixed Platforms.Build.0 = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Win32.ActiveCfg = Release|Win32
{E1360C65-D375-4335-8057-7ED99CC3F9B2}.Release|Win32.Build.0 = Release|Win32
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.Build.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Mixed Platforms.Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).Build.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Pocket PC 2003 (ARMV4).Deploy.0 = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Debug|Win32.ActiveCfg = Debug|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.Build.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Mixed Platforms.Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).ActiveCfg = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).Build.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Pocket PC 2003 (ARMV4).Deploy.0 = Release|Pocket PC 2003 (ARMV4)
{A955FC4A-73F1-44F7-135E-30D84D32F022}.Release|Win32.ActiveCfg = Release|Pocket PC 2003 (ARMV4)
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
EndGlobal

View File

@@ -98,11 +98,11 @@ install::
$(BUILD_PFX)%.c.d: %.c
$(if $(quiet),@echo " [DEP] $@")
$(qexec)mkdir -p $(dir $@)
$(qexec)$(CC) $(CFLAGS) -M $< | $(fmt_deps) > $@
$(qexec)$(CC) $(INTERNAL_CFLAGS) $(CFLAGS) -M $< | $(fmt_deps) > $@
$(BUILD_PFX)%.c.o: %.c
$(if $(quiet),@echo " [CC] $@")
$(qexec)$(CC) $(CFLAGS) -c -o $@ $<
$(qexec)$(CC) $(INTERNAL_CFLAGS) $(CFLAGS) -c -o $@ $<
$(BUILD_PFX)%.asm.d: %.asm
$(if $(quiet),@echo " [DEP] $@")
@@ -124,6 +124,12 @@ $(BUILD_PFX)%.s.o: %.s
$(if $(quiet),@echo " [AS] $@")
$(qexec)$(AS) $(ASFLAGS) -o $@ $<
.PRECIOUS: %.c.S
%.c.S: CFLAGS += -DINLINE_ASM
$(BUILD_PFX)%.c.S: %.c
$(if $(quiet),@echo " [GEN] $@")
$(qexec)$(CC) -S $(CFLAGS) -o $@ $<
.PRECIOUS: %.asm.s
$(BUILD_PFX)%.asm.s: %.asm
$(if $(quiet),@echo " [ASM CONVERSION] $@")
@@ -152,8 +158,8 @@ endif
# Rule to extract assembly constants from C sources
#
obj_int_extract: build/make/obj_int_extract.c
$(if $(quiet),echo " [HOSTCC] $@")
$(qexec)$(HOSTCC) -I. -o $@ $<
$(if $(quiet),@echo " [HOSTCC] $@")
$(qexec)$(HOSTCC) -I. -I$(SRC_PATH_BARE) -o $@ $<
CLEAN-OBJS += obj_int_extract
#
@@ -188,7 +194,7 @@ define linker_template
$(1): $(filter-out -%,$(2))
$(1):
$(if $(quiet),@echo " [LD] $$@")
$(qexec)$$(LD) $$(strip $$(LDFLAGS) -o $$@ $(2) $(3) $$(extralibs))
$(qexec)$$(LD) $$(strip $$(INTERNAL_LDFLAGS) $$(LDFLAGS) -o $$@ $(2) $(3) $$(extralibs))
endef
# make-3.80 has a bug with expanding large input strings to the eval function,
# which was triggered in some cases by the following component of
@@ -255,7 +261,7 @@ ifeq ($(filter clean,$(MAKECMDGOALS)),)
endif
#
# Configuration dependant rules
# Configuration dependent rules
#
$(call pairmap,install_map_templates,$(INSTALL_MAPS))
@@ -330,12 +336,10 @@ ifneq ($(call enabled,DIST-SRCS),)
DIST-SRCS-$(CONFIG_MSVS) += build/make/gen_msvs_proj.sh
DIST-SRCS-$(CONFIG_MSVS) += build/make/gen_msvs_sln.sh
DIST-SRCS-$(CONFIG_MSVS) += build/x86-msvs/yasm.rules
DIST-SRCS-$(CONFIG_MSVS) += build/x86-msvs/obj_int_extract.bat
DIST-SRCS-$(CONFIG_RVCT) += build/make/armlink_adapter.sh
#
# This isn't really ARCH_ARM dependent, it's dependant on whether we're
# using assembly code or not (CONFIG_OPTIMIZATIONS maybe). Just use
# this for now.
DIST-SRCS-$(ARCH_ARM) += build/make/obj_int_extract.c
# Include obj_int_extract if we use offsets from asm_*_offsets
DIST-SRCS-$(ARCH_ARM)$(ARCH_X86)$(ARCH_X86_64) += build/make/obj_int_extract.c
DIST-SRCS-$(ARCH_ARM) += build/make/ads2gas.pl
DIST-SRCS-yes += $(target:-$(TOOLCHAIN)=).mk
endif

View File

@@ -21,8 +21,14 @@ print "@ This file was created from a .asm file\n";
print "@ using the ads2gas.pl script.\n";
print "\t.equ DO1STROUNDING, 0\n";
# Stack of procedure names.
@proc_stack = ();
while (<STDIN>)
{
# Load and store alignment
s/@/,:/g;
# Comment character
s/;/@/g;
@@ -79,7 +85,10 @@ while (<STDIN>)
s/CODE([0-9][0-9])/.code $1/;
# No AREA required
s/^\s*AREA.*$/.text/;
# But ALIGNs in AREA must be obeyed
s/^\s*AREA.*ALIGN=([0-9])$/.text\n.p2align $1/;
# If no ALIGN, strip the AREA and align to 4 bytes
s/^\s*AREA.*$/.text\n.p2align 2/;
# DCD to .word
# This one is for incoming symbols
@@ -114,8 +123,8 @@ while (<STDIN>)
# put the colon at the end of the line in the macro
s/^([a-zA-Z_0-9\$]+)/$1:/ if !/EQU/;
# Strip ALIGN
s/\sALIGN/@ ALIGN/g;
# ALIGN directive
s/ALIGN/.balign/g;
# Strip ARM
s/\sARM/@ ARM/g;
@@ -127,9 +136,23 @@ while (<STDIN>)
# Strip PRESERVE8
s/\sPRESERVE8/@ PRESERVE8/g;
# Strip PROC and ENDPROC
s/\sPROC/@/g;
s/\sENDP/@/g;
# Use PROC and ENDP to give the symbols a .size directive.
# This makes them show up properly in debugging tools like gdb and valgrind.
if (/\bPROC\b/)
{
my $proc;
/^_([\.0-9A-Z_a-z]\w+)\b/;
$proc = $1;
push(@proc_stack, $proc) if ($proc);
s/\bPROC\b/@ $&/;
}
if (/\bENDP\b/)
{
my $proc;
s/\bENDP\b/@ $&/;
$proc = pop(@proc_stack);
$_ = "\t.size $proc, .-$proc".$_ if ($proc);
}
# EQU directive
s/(.*)EQU(.*)/.equ $1, $2/;
@@ -148,3 +171,6 @@ while (<STDIN>)
next if /^\s*END\s*$/;
print;
}
# Mark that this object doesn't need an executable stack.
printf ("\t.section\t.note.GNU-stack,\"\",\%\%progbits\n");

View File

@@ -41,6 +41,9 @@ sub trim($)
while (<STDIN>)
{
# Load and store alignment
s/@/,:/g;
# Comment character
s/;/@/g;
@@ -97,7 +100,10 @@ while (<STDIN>)
s/CODE([0-9][0-9])/.code $1/;
# No AREA required
s/^\s*AREA.*$/.text/;
# But ALIGNs in AREA must be obeyed
s/^\s*AREA.*ALIGN=([0-9])$/.text\n.p2align $1/;
# If no ALIGN, strip the AREA and align to 4 bytes
s/^\s*AREA.*$/.text\n.p2align 2/;
# DCD to .word
# This one is for incoming symbols
@@ -137,8 +143,8 @@ while (<STDIN>)
# put the colon at the end of the line in the macro
s/^([a-zA-Z_0-9\$]+)/$1:/ if !/EQU/;
# Strip ALIGN
s/\sALIGN/@ ALIGN/g;
# ALIGN directive
s/ALIGN/.balign/g;
# Strip ARM
s/\sARM/@ ARM/g;

View File

@@ -17,6 +17,8 @@ for i; do
on_of=1
elif [ "$i" == "-v" ]; then
verbose=1
elif [ "$i" == "-g" ]; then
args="${args} --debug"
elif [ "$on_of" == "1" ]; then
outfile=$i
on_of=0

View File

@@ -78,11 +78,12 @@ Build options:
--log=yes|no|FILE file configure log is written to [config.err]
--target=TARGET target platform tuple [generic-gnu]
--cpu=CPU optimize for a specific cpu rather than a family
--extra-cflags=ECFLAGS add ECFLAGS to CFLAGS [$CFLAGS]
${toggle_extra_warnings} emit harmless warnings (always non-fatal)
${toggle_werror} treat warnings as errors, if possible
(not available with all compilers)
${toggle_optimizations} turn on/off compiler optimization flags
${toggle_pic} turn on/off Position Independant Code
${toggle_pic} turn on/off Position Independent Code
${toggle_ccache} turn on/off compiler cache
${toggle_debug} enable/disable debug mode
${toggle_gprof} enable/disable gprof profiling instrumentation
@@ -411,11 +412,14 @@ EOF
write_common_target_config_h() {
cat > ${TMP_H} << EOF
/* This file automatically generated by configure. Do not edit! */
#ifndef VPX_CONFIG_H
#define VPX_CONFIG_H
#define RESTRICT ${RESTRICT}
EOF
print_config_h ARCH "${TMP_H}" ${ARCH_LIST}
print_config_h HAVE "${TMP_H}" ${HAVE_LIST}
print_config_h CONFIG "${TMP_H}" ${CONFIG_LIST}
echo "#endif /* VPX_CONFIG_H */" >> ${TMP_H}
mkdir -p `dirname "$1"`
cmp "$1" ${TMP_H} >/dev/null 2>&1 || mv ${TMP_H} "$1"
}
@@ -442,6 +446,9 @@ process_common_cmdline() {
;;
--cpu=*) tune_cpu="$optval"
;;
--extra-cflags=*)
extra_cflags="${optval}"
;;
--enable-?*|--disable-?*)
eval `echo "$opt" | sed 's/--/action=/;s/-/ option=/;s/-/_/g'`
echo "${CMDLINE_SELECT} ${ARCH_EXT_LIST}" | grep "^ *$option\$" >/dev/null || die_unknown $opt
@@ -547,6 +554,10 @@ process_common_toolchain() {
tgt_isa=universal
tgt_os=darwin9
;;
*darwin10*)
tgt_isa=x86_64
tgt_os=darwin10
;;
*mingw32*|*cygwin*)
[ -z "$tgt_isa" ] && tgt_isa=x86
tgt_os=win32
@@ -606,10 +617,20 @@ process_common_toolchain() {
add_ldflags "-isysroot /Developer/SDKs/MacOSX10.5.sdk"
add_ldflags "-mmacosx-version-min=10.5"
;;
*-darwin10-*)
add_cflags "-isysroot /Developer/SDKs/MacOSX10.6.sdk"
add_cflags "-mmacosx-version-min=10.6"
add_ldflags "-isysroot /Developer/SDKs/MacOSX10.6.sdk"
add_ldflags "-mmacosx-version-min=10.6"
;;
esac
# Handle Solaris variants. Solaris 10 needs -lposix4
case ${toolchain} in
sparc-solaris-*)
add_extralibs -lposix4
disable fast_unaligned
;;
*-solaris-*)
add_extralibs -lposix4
;;
@@ -621,8 +642,8 @@ process_common_toolchain() {
# on arm, isa versions are supersets
enabled armv7a && soft_enable armv7 ### DEBUG
enabled armv7 && soft_enable armv6
enabled armv6 && soft_enable armv5te
enabled armv6 && soft_enable fast_unaligned
enabled armv7 || enabled armv6 && soft_enable armv5te
enabled armv7 || enabled armv6 && soft_enable fast_unaligned
enabled iwmmxt2 && soft_enable iwmmxt
enabled iwmmxt && soft_enable armv5te
@@ -655,7 +676,7 @@ process_common_toolchain() {
check_add_cflags -march=${tgt_isa}
check_add_asflags -march=${tgt_isa}
fi
enabled debug && add_asflags -g
asm_conversion_cmd="${source_path}/build/make/ads2gas.pl"
;;
rvct)
@@ -671,7 +692,7 @@ process_common_toolchain() {
if enabled armv7
then
check_add_cflags --cpu=Cortex-A8 --fpu=softvfp+vfpv3
check_add_asflags --cpu=Cortex-A8 --fpu=none
check_add_asflags --cpu=Cortex-A8 --fpu=softvfp+vfpv3
else
check_add_cflags --cpu=${tgt_isa##armv}
check_add_asflags --cpu=${tgt_isa##armv}
@@ -680,16 +701,24 @@ process_common_toolchain() {
arch_int=${tgt_isa##armv}
arch_int=${arch_int%%te}
check_add_asflags --pd "\"ARCHITECTURE SETA ${arch_int}\""
enabled debug && add_asflags -g
add_cflags --gnu
add_cflags --enum_is_int
add_cflags --wchar32
;;
esac
case ${tgt_os} in
none*)
disable multithread
disable os_support
;;
darwin*)
SDK_PATH=/Developer/Platforms/iPhoneOS.platform/Developer
TOOLCHAIN_PATH=${SDK_PATH}/usr/bin
CC=${TOOLCHAIN_PATH}/gcc
AR=${TOOLCHAIN_PATH}/ar
LD=${TOOLCHAIN_PATH}/arm-apple-darwin9-gcc-4.2.1
LD=${TOOLCHAIN_PATH}/arm-apple-darwin10-gcc-4.2.1
AS=${TOOLCHAIN_PATH}/as
STRIP=${TOOLCHAIN_PATH}/strip
NM=${TOOLCHAIN_PATH}/nm
@@ -703,19 +732,18 @@ process_common_toolchain() {
add_cflags -arch ${tgt_isa}
add_ldflags -arch_only ${tgt_isa}
add_cflags "-isysroot /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.1.sdk"
add_cflags "-isysroot ${SDK_PATH}/SDKs/iPhoneOS4.3.sdk"
# This should be overridable
alt_libc=${SDK_PATH}/SDKs/iPhoneOS3.1.sdk
alt_libc=${SDK_PATH}/SDKs/iPhoneOS4.3.sdk
# Add the paths for the alternate libc
# for d in usr/include usr/include/gcc/darwin/4.0/; do
for d in usr/include usr/include/gcc/darwin/4.0/ usr/lib/gcc/arm-apple-darwin9/4.0.1/include/; do
for d in usr/include usr/include/gcc/darwin/4.2/ usr/lib/gcc/arm-apple-darwin10/4.2.1/include/; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_cflags -I"${try_dir}"
done
for d in lib usr/lib; do
for d in lib usr/lib usr/lib/system; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_ldflags -L"${try_dir}"
done
@@ -726,45 +754,24 @@ process_common_toolchain() {
linux*)
enable linux
if enabled rvct; then
# Compiling with RVCT requires an alternate libc (glibc) when
# targetting linux.
disabled builtin_libc \
|| die "Must supply --libc when targetting *-linux-rvct"
# Check if we have CodeSourcery GCC in PATH. Needed for
# libraries
hash arm-none-linux-gnueabi-gcc 2>&- || \
die "Couldn't find CodeSourcery GCC from PATH"
# Set up compiler
add_cflags --gnu
add_cflags --enum_is_int
add_cflags --library_interface=aeabi_glibc
add_cflags --no_hide_all
add_cflags --wchar32
add_cflags --dwarf2
add_cflags --gnu
# Use armcc as a linker to enable translation of
# some gcc specific options such as -lm and -lpthread.
LD="armcc --translate_gcc"
# Set up linker
add_ldflags --sysv --no_startup --no_ref_cpp_init
add_ldflags --entry=_start
add_ldflags --keep '"*(.init)"' --keep '"*(.fini)"'
add_ldflags --keep '"*(.init_array)"' --keep '"*(.fini_array)"'
add_ldflags --dynamiclinker=/lib/ld-linux.so.3
add_extralibs libc.so.6 -lc_nonshared crt1.o crti.o crtn.o
# create configuration file (uses path to CodeSourcery GCC)
armcc --arm_linux_configure --arm_linux_config_file=arm_linux.cfg
# Add the paths for the alternate libc
for d in usr/include; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_cflags -J"${try_dir}"
done
add_cflags -J"${RVCT31INC}"
for d in lib usr/lib; do
try_dir="${alt_libc}/${d}"
[ -d "${try_dir}" ] && add_ldflags -L"${try_dir}"
done
# glibc has some struct members named __align, which is a
# storage modifier in RVCT. If we need to use this modifier,
# we'll have to #undef it in our code. Note that this must
# happen AFTER all libc inclues.
add_cflags -D__align=x_align_x
add_cflags --arm_linux_paths --arm_linux_config_file=arm_linux.cfg
add_asflags --no_hide_all --apcs=/interwork
add_ldflags --arm_linux_paths --arm_linux_config_file=arm_linux.cfg
enabled pic && add_cflags --apcs=/fpic
enabled pic && add_asflags --apcs=/fpic
enabled shared && add_cflags --shared
fi
;;
@@ -824,6 +831,7 @@ process_common_toolchain() {
soft_enable sse2
soft_enable sse3
soft_enable ssse3
soft_enable sse4_1
case ${tgt_os} in
win*)
@@ -844,7 +852,7 @@ process_common_toolchain() {
setup_gnu_toolchain
add_cflags -use-msasm -use-asm
add_ldflags -i-static
enabled x86_64 && add_cflags -ipo -no-prec-div -static -xSSE3 -axSSE3
enabled x86_64 && add_cflags -ipo -no-prec-div -static -xSSE2 -axSSE2
enabled x86_64 && AR=xiar
case ${tune_cpu} in
atom*)
@@ -862,6 +870,8 @@ process_common_toolchain() {
link_with_cc=gcc
tune_cflags="-march="
setup_gnu_toolchain
#for 32 bit x86 builds, -O3 did not turn on this flag
enabled optimizations && check_add_cflags -fomit-frame-pointer
;;
esac
@@ -879,7 +889,7 @@ process_common_toolchain() {
case ${tgt_os} in
win*)
add_asflags -f win${bits}
enabled debug && add_asflags -g dwarf2
enabled debug && add_asflags -g cv8
;;
linux*|solaris*)
add_asflags -f elf${bits}
@@ -929,15 +939,23 @@ process_common_toolchain() {
enabled gcov &&
check_add_cflags -fprofile-arcs -ftest-coverage &&
check_add_ldflags -fprofile-arcs -ftest-coverage
if enabled optimizations; then
enabled rvct && check_add_cflags -Otime
if enabled rvct; then
enabled small && check_add_cflags -Ospace || check_add_cflags -Otime
else
enabled small && check_add_cflags -O2 || check_add_cflags -O3
fi
fi
# Position Independant Code (PIC) support, for building relocatable
# Position Independent Code (PIC) support, for building relocatable
# shared objects
enabled gcc && enabled pic && check_add_cflags -fPIC
# Work around longjmp interception on glibc >= 2.11, to improve binary
# compatibility. See http://code.google.com/p/webm/issues/detail?id=166
enabled linux && check_add_cflags -D_FORTIFY_SOURCE=0
# Check for strip utility variant
${STRIP} -V 2>/dev/null | grep GNU >/dev/null && enable gnu_strip
@@ -956,11 +974,20 @@ EOF
esac
fi
# for sysconf(3) and friends.
check_header unistd.h
# glibc needs these
if enabled linux; then
add_cflags -D_LARGEFILE_SOURCE
add_cflags -D_FILE_OFFSET_BITS=64
fi
# append any user defined extra cflags
if [ -n "${extra_cflags}" ] ; then
check_add_cflags ${extra_cflags} || \
die "Requested extra CFLAGS '${extra_cflags}' not supported by compiler"
fi
}
process_toolchain() {

View File

@@ -32,7 +32,8 @@ Options:
--name=project_name Name of the project (required)
--proj-guid=GUID GUID to use for the project
--module-def=filename File containing export definitions (for DLLs)
--ver=version Version (7,8) of visual studio to generate for
--ver=version Version (7,8,9) of visual studio to generate for
--src-path-bare=dir Path to root of source tree
-Ipath/to/include Additional include directories
-DFLAG[=value] Preprocessor macros to define
-Lpath/to/lib Additional library search paths
@@ -132,7 +133,7 @@ generate_filter() {
open_tag Filter \
Name=$name \
Filter=$pats \
UniqueIdentifier=`generate_uuid`
UniqueIdentifier=`generate_uuid` \
file_list_sz=${#file_list[@]}
for i in ${!file_list[@]}; do
@@ -146,29 +147,19 @@ generate_filter() {
for plat in "${platforms[@]}"; do
for cfg in Debug Release; do
open_tag FileConfiguration \
Name="${cfg}|${plat}"
Name="${cfg}|${plat}" \
tag Tool \
Name="VCCustomBuildTool" \
Description="Assembling \$(InputFileName)" \
CommandLine="$(eval echo \$asm_${cfg}_cmdline)"\
Outputs="\$(InputName).obj"
CommandLine="$(eval echo \$asm_${cfg}_cmdline)" \
Outputs="\$(InputName).obj" \
close_tag FileConfiguration
done
done
fi
if [ "${f##*.}" == "cpp" ]; then
for plat in "${platforms[@]}"; do
for cfg in Debug Release; do
open_tag FileConfiguration \
Name="${cfg}|${plat}"
tag Tool \
Name="VCCLCompilerTool" \
CompileAs="2"
close_tag FileConfiguration
done
done
fi
close_tag File
break
@@ -195,24 +186,27 @@ for opt in "$@"; do
;;
--proj-guid=*) guid="${optval}"
;;
--module-def=*)
link_opts="${link_opts} ModuleDefinitionFile=${optval}"
--module-def=*) link_opts="${link_opts} ModuleDefinitionFile=${optval}"
;;
--exe) proj_kind="exe"
;;
--lib) proj_kind="lib"
;;
--src-path-bare=*) src_path_bare="$optval"
;;
--static-crt) use_static_runtime=true
;;
--ver=*) vs_ver="$optval"
case $optval in
--ver=*)
vs_ver="$optval"
case "$optval" in
[789])
;;
*) die Unrecognized Visual Studio Version in $opt
;;
esac
;;
-I*) opt="${opt%/}"
-I*)
opt="${opt%/}"
incs="${incs}${incs:+;}&quot;${opt##-I}&quot;"
yasmincs="${yasmincs} ${opt}"
;;
@@ -232,10 +226,13 @@ for opt in "$@"; do
;;
-*) die_unknown $opt
;;
*) file_list[${#file_list[@]}]="$opt"
*)
file_list[${#file_list[@]}]="$opt"
case "$opt" in
*.asm) uses_asm=true;;
*.asm) uses_asm=true
;;
esac
;;
esac
done
outfile=${outfile:-/dev/stdout}
@@ -278,11 +275,7 @@ done
# List Keyword for this target
case "$target" in
x86*)
keyword="ManagedCProj"
;;
arm*|iwmmx*)
keyword="Win32Proj"
x86*) keyword="ManagedCProj"
;;
*) die "Unsupported target $target!"
esac
@@ -298,26 +291,7 @@ case "$target" in
asm_Debug_cmdline="yasm -Xvc -g cv8 -f \$(PlatformName) ${yasmincs} &quot;\$(InputPath)&quot;"
asm_Release_cmdline="yasm -Xvc -f \$(PlatformName) ${yasmincs} &quot;\$(InputPath)&quot;"
;;
arm*|iwmmx*)
case "${name}" in
obj_int_extract) platforms[0]="Win32"
;;
*) platforms[0]="Pocket PC 2003 (ARMV4)"
;;
esac
;;
*) die "Unsupported target $target!"
esac
# List Command-line Arguments for this target
case "$target" in
arm*|iwmmx*)
if [ "$name" == "example" ];then
ARGU="--codec vp6 --flipuv --progress _bnd.vp6"
fi
if [ "$name" == "xma" ];then
ARGU="--codec vp6 -h 240 -w 320 -v"
fi
;;
esac
@@ -336,7 +310,7 @@ generate_vcproj() {
Name="${name}" \
ProjectGUID="{${guid}}" \
RootNamespace="${name}" \
Keyword="${keyword}"
Keyword="${keyword}" \
open_tag Platforms
for plat in "${platforms[@]}"; do
@@ -348,21 +322,6 @@ generate_vcproj() {
case "$target" in
x86*) $uses_asm && tag ToolFile RelativePath="$self_dirname/../x86-msvs/yasm.rules"
;;
arm*|iwmmx*)
if [ "$name" == "vpx" ];then
case "$target" in
armv5*)
tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmv5.rules"
;;
armv6*)
tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmv6.rules"
;;
iwmmxt*)
tag ToolFile RelativePath="$self_dirname/../arm-wince-vs8/armasmxscale.rules"
;;
esac
fi
;;
esac
close_tag ToolFiles
@@ -374,68 +333,28 @@ generate_vcproj() {
OutputDirectory="\$(SolutionDir)$plat_no_ws/\$(ConfigurationName)" \
IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
ConfigurationType="$vs_ConfigurationType" \
CharacterSet="1"
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat \$(ConfigurationName)"
tag Tool \
Name="VCMIDLTool" \
TargetEnvironment="1"
tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;DEBUG;_LIB;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
MinimalRebuild="true" \
RuntimeLibrary="1" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
example|xma) tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;DEBUG;_CONSOLE;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
MinimalRebuild="true" \
RuntimeLibrary="1" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="_DEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
obj_int_extract) tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;DEBUG;_CONSOLE" \
RuntimeLibrary="1" \
WarningLevel="3" \
DebugInformationFormat="1" \
;;
esac
fi
CharacterSet="1" \
case "$target" in
x86*) tag Tool \
x86*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
RuntimeLibrary="$debug_runtime" \
WarningLevel="3" \
Detect64BitPortabilityProblems="true" \
DebugInformationFormat="1" \
;;
vpx)
tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat $src_path_bare" \
tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
@@ -446,42 +365,44 @@ generate_vcproj() {
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="1"
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="true"
;;
*)
tag Tool \
Name="VCCLCompilerTool" \
Optimization="0" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;_DEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$debug_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="1" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs" Debug="true"
;;
esac
;;
esac
case "$proj_kind" in
exe)
case "$target" in
x86*) tag Tool \
x86*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
GenerateDebugInformation="true" \
;;
*)
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$debug_libs \$(NoInherit)" \
AdditionalLibraryDirectories="$libdirs" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
;;
arm*|iwmmx*)
case "$name" in
obj_int_extract) tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
GenerateDebugInformation="true"
;;
*) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$debug_libs" \
OutputFile="\$(OutDir)/${name}.exe" \
LinkIncremental="2" \
AdditionalLibraryDirectories="${libdirs};&quot;..\lib/$plat_no_ws&quot;" \
DelayLoadDLLs="\$(NOINHERIT)" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
SubSystem="9" \
StackReserveSize="65536" \
StackCommitSize="4096" \
EntryPointSymbol="mainWCRTStartup" \
TargetMachine="3"
;;
esac
;;
@@ -489,41 +410,27 @@ generate_vcproj() {
;;
lib)
case "$target" in
arm*|iwmmx*) tag Tool \
Name="VCLibrarianTool" \
AdditionalOptions=" /subsystem:windowsce,4.20 /machine:ARM" \
OutputFile="\$(OutDir)/${name}.lib" \
;;
*) tag Tool \
x86*)
tag Tool \
Name="VCLibrarianTool" \
OutputFile="\$(OutDir)/${name}${lib_sfx}d.lib" \
;;
esac
;;
dll) tag Tool \
dll)
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="\$(NoInherit)" \
LinkIncremental="2" \
GenerateDebugInformation="true" \
AssemblyDebug="1" \
TargetMachine="1" \
$link_opts
$link_opts \
;;
esac
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
;;
example|xma) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
tag DebuggerTool \
Arguments="${ARGU}"
;;
esac
fi
close_tag Configuration
open_tag Configuration \
@@ -532,118 +439,79 @@ generate_vcproj() {
IntermediateDirectory="$plat_no_ws/\$(ConfigurationName)/${name}" \
ConfigurationType="$vs_ConfigurationType" \
CharacterSet="1" \
WholeProgramOptimization="0"
WholeProgramOptimization="0" \
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$target" in
x86*)
case "$name" in
vpx) tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat \$(ConfigurationName)"
tag Tool \
Name="VCMIDLTool" \
TargetEnvironment="1"
obj_int_extract)
tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="2" \
FavorSizeOrSpeed="1" \
FavorSizeorSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;_LIB;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
RuntimeLibrary="0" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
example|xma) tag Tool \
Name="VCCLCompilerTool" \
ExecutionBucket="7" \
Optimization="2" \
FavorSizeOrSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES);WINCE;_CONSOLE;\$(ARCHFAM);\$(_ARCHFAM_);_UNICODE;UNICODE;" \
RuntimeLibrary="0" \
BufferSecurityCheck="false" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
CompileAs="1"
tag Tool \
Name="VCResourceCompilerTool" \
PreprocessorDefinitions="NDEBUG;_WIN32_WCE=\$(CEVER);UNDER_CE;\$(PLATFORMDEFINES)" \
Culture="1033" \
AdditionalIncludeDirectories="\$(IntDir)" \
;;
obj_int_extract) tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE" \
RuntimeLibrary="0" \
PreprocessorDefinitions="WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
Detect64BitPortabilityProblems="true" \
DebugInformationFormat="0" \
;;
esac
fi
vpx)
tag Tool \
Name="VCPreBuildEventTool" \
CommandLine="call obj_int_extract.bat $src_path_bare" \
case "$target" in
x86*) tag Tool \
tag Tool \
Name="VCCLCompilerTool" \
Optimization="2" \
FavorSizeorSpeed="1" \
AdditionalIncludeDirectories="$incs" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
Detect64BitPortabilityProblems="true"
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs"
;;
*)
tag Tool \
Name="VCCLCompilerTool" \
AdditionalIncludeDirectories="$incs" \
Optimization="2" \
FavorSizeorSpeed="1" \
PreprocessorDefinitions="WIN32;NDEBUG;_CRT_SECURE_NO_WARNINGS;_CRT_SECURE_NO_DEPRECATE;$defines" \
RuntimeLibrary="$release_runtime" \
UsePrecompiledHeader="0" \
WarningLevel="3" \
DebugInformationFormat="0" \
Detect64BitPortabilityProblems="true" \
$uses_asm && tag Tool Name="YASM" IncludePaths="$incs"
;;
esac
;;
esac
case "$proj_kind" in
exe)
case "$target" in
x86*) tag Tool \
x86*)
case "$name" in
obj_int_extract)
tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
GenerateDebugInformation="true" \
;;
*)
tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$libs \$(NoInherit)" \
AdditionalLibraryDirectories="$libdirs" \
;;
arm*|iwmmx*)
case "$name" in
obj_int_extract) tag Tool \
Name="VCLinkerTool" \
OutputFile="${name}.exe" \
LinkIncremental="1" \
GenerateDebugInformation="false" \
SubSystem="0" \
OptimizeReferences="0" \
EnableCOMDATFolding="0" \
TargetMachine="0"
;;
*) tag Tool \
Name="VCLinkerTool" \
AdditionalDependencies="$libs" \
OutputFile="\$(OutDir)/${name}.exe" \
LinkIncremental="1" \
AdditionalLibraryDirectories="${libdirs};&quot;..\lib/$plat_no_ws&quot;" \
DelayLoadDLLs="\$(NOINHERIT)" \
GenerateDebugInformation="true" \
ProgramDatabaseFile="\$(OutDir)/${name}.pdb" \
SubSystem="9" \
StackReserveSize="65536" \
StackCommitSize="4096" \
OptimizeReferences="2" \
EnableCOMDATFolding="2" \
EntryPointSymbol="mainWCRTStartup" \
TargetMachine="3"
;;
esac
;;
@@ -651,14 +519,11 @@ generate_vcproj() {
;;
lib)
case "$target" in
arm*|iwmmx*) tag Tool \
Name="VCLibrarianTool" \
AdditionalOptions=" /subsystem:windowsce,4.20 /machine:ARM" \
OutputFile="\$(OutDir)/${name}.lib" \
;;
*) tag Tool \
x86*)
tag Tool \
Name="VCLibrarianTool" \
OutputFile="\$(OutDir)/${name}${lib_sfx}.lib" \
;;
esac
;;
@@ -669,31 +534,18 @@ generate_vcproj() {
LinkIncremental="1" \
GenerateDebugInformation="true" \
TargetMachine="1" \
$link_opts
esac
$link_opts \
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
case "$name" in
vpx) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
;;
example|xma) tag DeploymentTool \
ForceDirty="-1" \
RegisterOutput="0"
tag DebuggerTool \
Arguments="${ARGU}"
;;
esac
fi
close_tag Configuration
done
close_tag Configurations
open_tag Files
generate_filter srcs "Source Files" "cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
generate_filter hdrs "Header Files" "h;hpp;hxx;hm;inl;inc;xsd"
generate_filter srcs "Source Files" "c;def;odl;idl;hpj;bat;asm;asmx"
generate_filter hdrs "Header Files" "h;hm;inl;inc;xsd"
generate_filter resrcs "Resource Files" "rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
generate_filter resrcs "Build Files" "mk"
close_tag Files

View File

@@ -139,9 +139,6 @@ process_global() {
echo "${indent}${proj_guid}.${config}.ActiveCfg = ${config}"
echo "${indent}${proj_guid}.${config}.Build.0 = ${config}"
if [ "$target" == "armv6-wince-vs8" ] || [ "$target" == "armv5te-wince-vs8" ] || [ "$target" == "iwmmxt-wince-vs8" ] || [ "$target" == "iwmmxt2-wince-vs8" ];then
echo "${indent}${proj_guid}.${config}.Deploy.0 = ${config}"
fi
done
IFS=${IFS_bak}
done

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,15 @@
REM Copyright (c) 2011 The WebM project authors. All Rights Reserved.
REM
REM Use of this source code is governed by a BSD-style license
REM that can be found in the LICENSE file in the root of the source
REM tree. An additional intellectual property rights grant can be found
REM in the file PATENTS. All contributing project authors may
REM be found in the AUTHORS file in the root of the source tree.
echo on
cl /I "./" /I "%1" /nologo /c "%1/vp8/common/asm_com_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp8/decoder/asm_dec_offsets.c"
cl /I "./" /I "%1" /nologo /c "%1/vp8/encoder/asm_enc_offsets.c"
obj_int_extract.exe rvds "asm_com_offsets.obj" > "asm_com_offsets.asm"
obj_int_extract.exe rvds "asm_dec_offsets.obj" > "asm_dec_offsets.asm"
obj_int_extract.exe rvds "asm_enc_offsets.obj" > "asm_enc_offsets.asm"

46
configure vendored
View File

@@ -31,16 +31,18 @@ Advanced options:
${toggle_md5} support for output of checksum data
${toggle_static_msvcrt} use static MSVCRT (VS builds only)
${toggle_vp8} VP8 codec support
${toggle_psnr} output of PSNR data, if supported (encoders)
${toggle_internal_stats} output of encoder internal stats for debug, if supported (encoders)
${toggle_mem_tracker} track memory usage
${toggle_postproc} postprocessing
${toggle_multithread} multithreaded encoding and decoding.
${toggle_spatial_resampling} spatial sampling (scaling) support
${toggle_realtime_only} enable this option while building for real-time encoding
${toggle_error_concealment} enable this option to get a decoder which is able to conceal losses
${toggle_runtime_cpu_detect} runtime cpu detection
${toggle_shared} shared library support
${toggle_static} static library support
${toggle_small} favor smaller size over speed
${toggle_arm_asm_detok} assembly version of the detokenizer (ARM platforms only)
${toggle_postproc_visualizer} macro block / block level visualizers
Codecs:
Codecs can be selectively enabled or disabled individually, or by family:
@@ -78,22 +80,21 @@ EOF
# alphabetically by architecture, generic-gnu last.
all_platforms="${all_platforms} armv5te-linux-rvct"
all_platforms="${all_platforms} armv5te-linux-gcc"
all_platforms="${all_platforms} armv5te-none-rvct"
all_platforms="${all_platforms} armv5te-symbian-gcc"
all_platforms="${all_platforms} armv5te-wince-vs8"
all_platforms="${all_platforms} armv6-darwin-gcc"
all_platforms="${all_platforms} armv6-linux-rvct"
all_platforms="${all_platforms} armv6-linux-gcc"
all_platforms="${all_platforms} armv6-none-rvct"
all_platforms="${all_platforms} armv6-symbian-gcc"
all_platforms="${all_platforms} armv6-wince-vs8"
all_platforms="${all_platforms} iwmmxt-linux-rvct"
all_platforms="${all_platforms} iwmmxt-linux-gcc"
all_platforms="${all_platforms} iwmmxt-wince-vs8"
all_platforms="${all_platforms} iwmmxt2-linux-rvct"
all_platforms="${all_platforms} iwmmxt2-linux-gcc"
all_platforms="${all_platforms} iwmmxt2-wince-vs8"
all_platforms="${all_platforms} armv7-darwin-gcc" #neon Cortex-A8
all_platforms="${all_platforms} armv7-linux-rvct" #neon Cortex-A8
all_platforms="${all_platforms} armv7-linux-gcc" #neon Cortex-A8
all_platforms="${all_platforms} armv7-none-rvct" #neon Cortex-A8
all_platforms="${all_platforms} mips32-linux-gcc"
all_platforms="${all_platforms} ppc32-darwin8-gcc"
all_platforms="${all_platforms} ppc32-darwin9-gcc"
@@ -114,6 +115,7 @@ all_platforms="${all_platforms} x86-win32-vs7"
all_platforms="${all_platforms} x86-win32-vs8"
all_platforms="${all_platforms} x86-win32-vs9"
all_platforms="${all_platforms} x86_64-darwin9-gcc"
all_platforms="${all_platforms} x86_64-darwin10-gcc"
all_platforms="${all_platforms} x86_64-linux-gcc"
all_platforms="${all_platforms} x86_64-linux-icc"
all_platforms="${all_platforms} x86_64-solaris-gcc"
@@ -152,11 +154,13 @@ enabled doxygen && php -v >/dev/null 2>&1 && enable install_docs
enable install_bins
enable install_libs
enable static
enable optimizations
enable fast_unaligned #allow unaligned accesses, if supported by hw
enable md5
enable spatial_resampling
enable multithread
enable os_support
[ -d ${source_path}/../include ] && enable alt_tree_layout
for d in vp8; do
@@ -199,6 +203,7 @@ ARCH_EXT_LIST="
sse2
sse3
ssse3
sse4_1
altivec
"
@@ -209,6 +214,7 @@ HAVE_LIST="
alt_tree_layout
pthread_h
sys_mman_h
unistd_h
"
CONFIG_LIST="
external_build
@@ -238,7 +244,7 @@ CONFIG_LIST="
runtime_cpu_detect
postproc
multithread
psnr
internal_stats
${CODECS}
${CODEC_FAMILIES}
encoders
@@ -246,9 +252,12 @@ CONFIG_LIST="
static_msvcrt
spatial_resampling
realtime_only
error_concealment
shared
static
small
arm_asm_detok
postproc_visualizer
os_support
"
CMDLINE_SELECT="
extra_warnings
@@ -278,16 +287,18 @@ CMDLINE_SELECT="
dc_recon
postproc
multithread
psnr
internal_stats
${CODECS}
${CODEC_FAMILIES}
static_msvcrt
mem_tracker
spatial_resampling
realtime_only
error_concealment
shared
static
small
arm_asm_detok
postproc_visualizer
"
process_cmdline() {
@@ -295,7 +306,7 @@ process_cmdline() {
optval="${opt#*=}"
case "$opt" in
--disable-codecs) for c in ${CODECS}; do disable $c; done ;;
*) process_common_cmdline $opt
*) process_common_cmdline "$opt"
;;
esac
done
@@ -324,8 +335,6 @@ post_process_cmdline() {
for c in ${CODECS}; do
enabled ${c} && enable ${c##*_}s
done
}
@@ -376,6 +385,7 @@ process_targets() {
if [ -f "${source_path}/build/make/version.sh" ]; then
local ver=`"$source_path/build/make/version.sh" --bare $source_path`
DIST_DIR="${DIST_DIR}-${ver}"
VERSION_STRING=${ver}
ver=${ver%%-*}
VERSION_PATCH=${ver##*.}
ver=${ver%.*}
@@ -384,6 +394,8 @@ process_targets() {
VERSION_MAJOR=${ver%.*}
fi
enabled child || cat <<EOF >> config.mk
PREFIX=${prefix}
ifeq (\$(MAKECMDGOALS),dist)
DIST_DIR?=${DIST_DIR}
else
@@ -391,6 +403,8 @@ DIST_DIR?=\$(DESTDIR)${prefix}
endif
LIBSUBDIR=${libdir##${prefix}/}
VERSION_STRING=${VERSION_STRING}
VERSION_MAJOR=${VERSION_MAJOR}
VERSION_MINOR=${VERSION_MINOR}
VERSION_PATCH=${VERSION_PATCH}
@@ -485,7 +499,7 @@ process_toolchain() {
check_add_cflags -Wpointer-arith
check_add_cflags -Wtype-limits
check_add_cflags -Wcast-qual
enabled extra_warnings || check_add_cflags -Wno-unused
enabled extra_warnings || check_add_cflags -Wno-unused-function
fi
if enabled icc; then
@@ -535,6 +549,10 @@ process_toolchain() {
# Other toolchain specific defaults
case $toolchain in x86*|ppc*|universal*) soft_enable postproc;; esac
if enabled postproc_visualizer; then
enabled postproc || die "postproc_visualizer requires postproc to be enabled"
fi
}

View File

@@ -34,7 +34,8 @@ TXT_DOX = $(call enabled,TXT_DOX)
EXAMPLE_PATH += $(SRC_PATH_BARE) #for CHANGELOG, README, etc
doxyfile: libs.doxy_template libs.doxy examples.doxy
doxyfile: $(if $(findstring examples, $(ALL_TARGETS)),examples.doxy)
doxyfile: libs.doxy_template libs.doxy
@echo " [CREATE] $@"
@cat $^ > $@
@echo "STRIP_FROM_PATH += $(SRC_PATH_BARE) $(BUILD_ROOT)" >> $@

View File

@@ -17,6 +17,7 @@ vpxdec.SRCS += md5_utils.c md5_utils.h
vpxdec.SRCS += vpx_ports/vpx_timer.h
vpxdec.SRCS += vpx/vpx_integer.h
vpxdec.SRCS += args.c args.h vpx_ports/config.h
vpxdec.SRCS += tools_common.c tools_common.h
vpxdec.SRCS += nestegg/halloc/halloc.h
vpxdec.SRCS += nestegg/halloc/src/align.h
vpxdec.SRCS += nestegg/halloc/src/halloc.c
@@ -28,6 +29,7 @@ vpxdec.GUID = BA5FE66F-38DD-E034-F542-B1578C5FB950
vpxdec.DESCRIPTION = Full featured decoder
UTILS-$(CONFIG_ENCODERS) += vpxenc.c
vpxenc.SRCS += args.c args.h y4minput.c y4minput.h
vpxenc.SRCS += tools_common.c tools_common.h
vpxenc.SRCS += vpx_ports/config.h vpx_ports/mem_ops.h
vpxenc.SRCS += vpx_ports/mem_ops_aligned.h
vpxenc.SRCS += libmkv/EbmlIDs.h
@@ -75,6 +77,11 @@ GEN_EXAMPLES-$(CONFIG_ENCODERS) += decode_with_drops.c
endif
decode_with_drops.GUID = CE5C53C4-8DDA-438A-86ED-0DDD3CDB8D26
decode_with_drops.DESCRIPTION = Drops frames while decoding
ifeq ($(CONFIG_DECODERS),yes)
GEN_EXAMPLES-$(CONFIG_ERROR_CONCEALMENT) += decode_with_partial_drops.c
endif
decode_with_partial_drops.GUID = 61C2D026-5754-46AC-916F-1343ECC5537E
decode_with_partial_drops.DESCRIPTION = Drops parts of frames while decoding
GEN_EXAMPLES-$(CONFIG_ENCODERS) += error_resilient.c
error_resilient.GUID = DF5837B9-4145-4F92-A031-44E4F832E00C
error_resilient.DESCRIPTION = Error Resiliency Feature
@@ -91,8 +98,16 @@ vp8cx_set_ref.DESCRIPTION = VP8 set encoder reference frame
# Handle extra library flags depending on codec configuration
CODEC_EXTRA_LIBS-$(CONFIG_VP8) += m
# We should not link to math library (libm) on RVCT
# when building for bare-metal targets
ifeq ($(CONFIG_OS_SUPPORT), yes)
CODEC_EXTRA_LIBS-$(CONFIG_VP8) += m
else
ifeq ($(CONFIG_GCC), yes)
CODEC_EXTRA_LIBS-$(CONFIG_VP8) += m
endif
endif
#
# End of specified files. The rest of the build rules should happen
# automagically from here.
@@ -112,8 +127,8 @@ else
LIB_PATH := $(call enabled,LIB_PATH)
INC_PATH := $(call enabled,INC_PATH)
endif
CFLAGS += $(addprefix -I,$(INC_PATH))
LDFLAGS += $(addprefix -L,$(LIB_PATH))
INTERNAL_CFLAGS = $(addprefix -I,$(INC_PATH))
INTERNAL_LDFLAGS += $(addprefix -L,$(LIB_PATH))
# Expand list of selected examples to build (as specified above)
@@ -152,8 +167,10 @@ BINS-$(NOT_MSVS) += $(addprefix $(BUILD_PFX),$(ALL_EXAMPLES:.c=))
# Instantiate linker template for all examples.
CODEC_LIB=$(if $(CONFIG_DEBUG_LIBS),vpx_g,vpx)
CODEC_LIB_SUF=$(if $(CONFIG_SHARED),.so,.a)
$(foreach bin,$(BINS-yes),\
$(if $(BUILD_OBJS),$(eval $(bin): $(LIB_PATH)/lib$(CODEC_LIB).a))\
$(if $(BUILD_OBJS),$(eval $(bin):\
$(LIB_PATH)/lib$(CODEC_LIB)$(CODEC_LIB_SUF)))\
$(if $(BUILD_OBJS),$(eval $(call linker_template,$(bin),\
$(call objs,$($(notdir $(bin)).SRCS)) \
-l$(CODEC_LIB) $(addprefix -l,$(CODEC_EXTRA_LIBS))\
@@ -204,7 +221,8 @@ $(1): $($(1:.vcproj=).SRCS)
--ver=$$(CONFIG_VS_VERSION)\
--proj-guid=$$($$(@:.vcproj=).GUID)\
$$(if $$(CONFIG_STATIC_MSVCRT),--static-crt) \
--out=$$@ $$(CFLAGS) $$(LDFLAGS) -l$$(CODEC_LIB) -lwinmm $$^
--out=$$@ $$(INTERNAL_CFLAGS) $$(CFLAGS) \
$$(INTERNAL_LDFLAGS) $$(LDFLAGS) -l$$(CODEC_LIB) -lwinmm $$^
endef
PROJECTS-$(CONFIG_MSVS) += $(ALL_EXAMPLES:.c=.vcproj)
INSTALL-BINS-$(CONFIG_MSVS) += $(foreach p,$(VS_PLATFORMS),\

View File

@@ -0,0 +1,238 @@
@TEMPLATE decoder_tmpl.c
Decode With Partial Drops Example
=========================
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
This is an example utility which drops a series of frames (or parts of frames),
as specified on the command line. This is useful for observing the error
recovery features of the codec.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INTRODUCTION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
#include <time.h>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_INCLUDES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
struct parsed_header
{
char key_frame;
int version;
char show_frame;
int first_part_size;
};
int next_packet(struct parsed_header* hdr, int pos, int length, int mtu)
{
int size = 0;
int remaining = length - pos;
/* Uncompressed part is 3 bytes for P frames and 10 bytes for I frames */
int uncomp_part_size = (hdr->key_frame ? 10 : 3);
/* number of bytes yet to send from header and the first partition */
int remainFirst = uncomp_part_size + hdr->first_part_size - pos;
if (remainFirst > 0)
{
if (remainFirst <= mtu)
{
size = remainFirst;
}
else
{
size = mtu;
}
return size;
}
/* second partition; just slot it up according to MTU */
if (remaining <= mtu)
{
size = remaining;
return size;
}
return mtu;
}
void throw_packets(unsigned char* frame, int* size, int loss_rate,
int* thrown, int* kept)
{
unsigned char loss_frame[256*1024];
int pkg_size = 1;
int pos = 0;
int loss_pos = 0;
struct parsed_header hdr;
unsigned int tmp;
int mtu = 1500;
if (*size < 3)
{
return;
}
putc('|', stdout);
/* parse uncompressed 3 bytes */
tmp = (frame[2] << 16) | (frame[1] << 8) | frame[0];
hdr.key_frame = !(tmp & 0x1); /* inverse logic */
hdr.version = (tmp >> 1) & 0x7;
hdr.show_frame = (tmp >> 4) & 0x1;
hdr.first_part_size = (tmp >> 5) & 0x7FFFF;
/* don't drop key frames */
if (hdr.key_frame)
{
int i;
*kept = *size/mtu + ((*size % mtu > 0) ? 1 : 0); /* approximate */
for (i=0; i < *kept; i++)
putc('.', stdout);
return;
}
while ((pkg_size = next_packet(&hdr, pos, *size, mtu)) > 0)
{
int loss_event = ((rand() + 1.0)/(RAND_MAX + 1.0) < loss_rate/100.0);
if (*thrown == 0 && !loss_event)
{
memcpy(loss_frame + loss_pos, frame + pos, pkg_size);
loss_pos += pkg_size;
(*kept)++;
putc('.', stdout);
}
else
{
(*thrown)++;
putc('X', stdout);
}
pos += pkg_size;
}
memcpy(frame, loss_frame, loss_pos);
memset(frame + loss_pos, 0, *size - loss_pos);
*size = loss_pos;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ HELPERS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INIT
/* Initialize codec */
flags = VPX_CODEC_USE_ERROR_CONCEALMENT;
res = vpx_codec_dec_init(&codec, interface, &dec_cfg, flags);
if(res)
die_codec(&codec, "Failed to initialize decoder");
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INIT
Usage
-----
This example adds a single argument to the `simple_decoder` example,
which specifies the range or pattern of frames to drop. The parameter is
parsed as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
if(argc < 4 || argc > 6)
die("Usage: %s <infile> <outfile> [-t <num threads>] <N-M|N/M|L,S>\n",
argv[0]);
{
char *nptr;
int arg_num = 3;
if (argc == 6 && strncmp(argv[arg_num++], "-t", 2) == 0)
dec_cfg.threads = strtol(argv[arg_num++], NULL, 0);
n = strtol(argv[arg_num], &nptr, 0);
mode = (*nptr == '\0' || *nptr == ',') ? 2 : (*nptr == '-') ? 1 : 0;
m = strtol(nptr+1, NULL, 0);
if((!n && !m) || (*nptr != '-' && *nptr != '/' &&
*nptr != '\0' && *nptr != ','))
die("Couldn't parse pattern %s\n", argv[3]);
}
seed = (m > 0) ? m : (unsigned int)time(NULL);
srand(seed);thrown_frame = 0;
printf("Seed: %u\n", seed);
printf("Threads: %d\n", dec_cfg.threads);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ USAGE
Dropping A Range Of Frames
--------------------------
To drop a range of frames, specify the starting frame and the ending
frame to drop, separated by a dash. The following command will drop
frames 5 through 10 (base 1).
$ ./decode_with_partial_drops in.ivf out.i420 5-10
Dropping A Pattern Of Frames
----------------------------
To drop a pattern of frames, specify the number of frames to drop and
the number of frames after which to repeat the pattern, separated by
a forward-slash. The following command will drop 3 of 7 frames.
Specifically, it will decode 4 frames, then drop 3 frames, and then
repeat.
$ ./decode_with_partial_drops in.ivf out.i420 3/7
Dropping Random Parts Of Frames
-------------------------------
A third argument tuple is available to split the frame into 1500 bytes pieces
and randomly drop pieces rather than frames. The frame will be split at
partition boundaries where possible. The following example will seed the RNG
with the seed 123 and drop approximately 5% of the pieces. Pieces which
are depending on an already dropped piece will also be dropped.
$ ./decode_with_partial_drops in.ivf out.i420 5,123
Extra Variables
---------------
This example maintains the pattern passed on the command line in the
`n`, `m`, and `is_range` variables:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
int n, m, mode;
unsigned int seed;
int thrown=0, kept=0;
int thrown_frame=0, kept_frame=0;
vpx_codec_dec_cfg_t dec_cfg = {0};
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ EXTRA_VARS
Making The Drop Decision
------------------------
The example decides whether to drop the frame based on the current
frame number, immediately before decoding the frame.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE
/* Decide whether to throw parts of the frame or the whole frame
depending on the drop mode */
thrown_frame = 0;
kept_frame = 0;
switch (mode)
{
case 0:
if (m - (frame_cnt-1)%m <= n)
{
frame_sz = 0;
}
break;
case 1:
if (frame_cnt >= n && frame_cnt <= m)
{
frame_sz = 0;
}
break;
case 2:
throw_packets(frame, &frame_sz, n, &thrown_frame, &kept_frame);
break;
default: break;
}
if (mode < 2)
{
if (frame_sz == 0)
{
putc('X', stdout);
thrown_frame++;
}
else
{
putc('.', stdout);
kept_frame++;
}
}
thrown += thrown_frame;
kept += kept_frame;
fflush(stdout);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PRE_DECODE

View File

@@ -19,7 +19,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_decoder.h"
#include "vpx/vp8dx.h"
#define interface (&vpx_codec_vp8_dx_algo)
#define interface (vpx_codec_vp8_dx())
@EXTRA_INCLUDES
@@ -42,6 +42,8 @@ static void die(const char *fmt, ...) {
@DIE_CODEC
@HELPERS
int main(int argc, char **argv) {
FILE *infile, *outfile;
vpx_codec_ctx_t codec;

View File

@@ -2,7 +2,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_decoder.h"
#include "vpx/vp8dx.h"
#define interface (&vpx_codec_vp8_dx_algo)
#define interface (vpx_codec_vp8_dx())
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ DEC_INCLUDES

View File

@@ -19,7 +19,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_encoder.h"
#include "vpx/vp8cx.h"
#define interface (&vpx_codec_vp8_cx_algo)
#define interface (vpx_codec_vp8_cx())
#define fourcc 0x30385056
@EXTRA_INCLUDES
@@ -111,8 +111,6 @@ int main(int argc, char **argv) {
vpx_codec_ctx_t codec;
vpx_codec_enc_cfg_t cfg;
int frame_cnt = 0;
unsigned char file_hdr[IVF_FILE_HDR_SZ];
unsigned char frame_hdr[IVF_FRAME_HDR_SZ];
vpx_image_t raw;
vpx_codec_err_t res;
long width;

View File

@@ -2,7 +2,7 @@
#define VPX_CODEC_DISABLE_COMPAT 1
#include "vpx/vpx_encoder.h"
#include "vpx/vp8cx.h"
#define interface (&vpx_codec_vp8_cx_algo)
#define interface (vpx_codec_vp8_cx())
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ENC_INCLUDES

View File

@@ -21,7 +21,7 @@ res = vpx_codec_dec_init(&codec, interface, NULL,
if(res == VPX_CODEC_INCAPABLE) {
printf("NOTICE: Postproc not supported by %s\n",
vpx_codec_iface_name(interface));
res = vpx_codec_dec_init(&codec, interface, NULL, 0);
res = vpx_codec_dec_init(&codec, interface, NULL, flags);
}
if(res)
die_codec(&codec, "Failed to initialize decoder");

View File

@@ -33,7 +33,7 @@ Initializing The Codec
----------------------
The decoder is initialized by the following code. This is an example for
the VP8 decoder, but the code is analogous for all algorithms. Replace
`&vpx_codec_vp8_dx_algo` with a pointer to the interface exposed by the
`vpx_codec_vp8_dx()` with a pointer to the interface exposed by the
algorithm you want to use. The `cfg` argument is left as NULL in this
example, because we want the algorithm to determine the stream
configuration (width/height) and allocate memory automatically. This

View File

@@ -78,8 +78,8 @@ if(frame_cnt + 1 == 22) {
} else if(frame_cnt + 1 == 44) {
vpx_active_map_t active;
active.rows = 240/16;
active.cols = 320/16;
active.rows = cfg.g_h/16;
active.cols = cfg.g_w/16;
/* pass in null map to disable active_map*/
active.active_map = NULL;

View File

@@ -120,7 +120,7 @@ enum mkv
//video
Video = 0xE0,
FlagInterlaced = 0x9A,
// StereoMode = 0x53B8,
StereoMode = 0x53B8,
PixelWidth = 0xB0,
PixelHeight = 0xBA,
PixelCropBottom = 0x54AA,

View File

@@ -11,6 +11,7 @@
#include <stdlib.h>
#include <wchar.h>
#include <string.h>
#include <limits.h>
#if defined(_MSC_VER)
#define LITERALU64(n) n
#else
@@ -33,7 +34,7 @@ void Ebml_WriteLen(EbmlGlobal *glob, long long val)
val |= (LITERALU64(0x000000000000080) << ((size - 1) * 7));
Ebml_Serialize(glob, (void *) &val, size);
Ebml_Serialize(glob, (void *) &val, sizeof(val), size);
}
void Ebml_WriteString(EbmlGlobal *glob, const char *str)
@@ -60,21 +61,26 @@ void Ebml_WriteUTF8(EbmlGlobal *glob, const wchar_t *wstr)
void Ebml_WriteID(EbmlGlobal *glob, unsigned long class_id)
{
int len;
if (class_id >= 0x01000000)
Ebml_Serialize(glob, (void *)&class_id, 4);
len = 4;
else if (class_id >= 0x00010000)
Ebml_Serialize(glob, (void *)&class_id, 3);
len = 3;
else if (class_id >= 0x00000100)
Ebml_Serialize(glob, (void *)&class_id, 2);
len = 2;
else
Ebml_Serialize(glob, (void *)&class_id, 1);
len = 1;
Ebml_Serialize(glob, (void *)&class_id, sizeof(class_id), len);
}
void Ebml_SerializeUnsigned64(EbmlGlobal *glob, unsigned long class_id, uint64_t ui)
{
unsigned char sizeSerialized = 8 | 0x80;
Ebml_WriteID(glob, class_id);
Ebml_Serialize(glob, &sizeSerialized, 1);
Ebml_Serialize(glob, &ui, 8);
Ebml_Serialize(glob, &sizeSerialized, sizeof(sizeSerialized), 1);
Ebml_Serialize(glob, &ui, sizeof(ui), 8);
}
void Ebml_SerializeUnsigned(EbmlGlobal *glob, unsigned long class_id, unsigned long ui)
@@ -97,8 +103,8 @@ void Ebml_SerializeUnsigned(EbmlGlobal *glob, unsigned long class_id, unsigned l
}
sizeSerialized = 0x80 | size;
Ebml_Serialize(glob, &sizeSerialized, 1);
Ebml_Serialize(glob, &ui, size);
Ebml_Serialize(glob, &sizeSerialized, sizeof(sizeSerialized), 1);
Ebml_Serialize(glob, &ui, sizeof(ui), size);
}
//TODO: perhaps this is a poor name for this id serializer helper function
void Ebml_SerializeBinary(EbmlGlobal *glob, unsigned long class_id, unsigned long bin)
@@ -119,14 +125,14 @@ void Ebml_SerializeFloat(EbmlGlobal *glob, unsigned long class_id, double d)
unsigned char len = 0x88;
Ebml_WriteID(glob, class_id);
Ebml_Serialize(glob, &len, 1);
Ebml_Serialize(glob, &d, 8);
Ebml_Serialize(glob, &len, sizeof(len), 1);
Ebml_Serialize(glob, &d, sizeof(d), 8);
}
void Ebml_WriteSigned16(EbmlGlobal *glob, short val)
{
signed long out = ((val & 0x003FFFFF) | 0x00200000) << 8;
Ebml_Serialize(glob, &out, 3);
Ebml_Serialize(glob, &out, sizeof(out), 3);
}
void Ebml_SerializeString(EbmlGlobal *glob, unsigned long class_id, const char *s)
@@ -143,7 +149,6 @@ void Ebml_SerializeUTF8(EbmlGlobal *glob, unsigned long class_id, wchar_t *s)
void Ebml_SerializeData(EbmlGlobal *glob, unsigned long class_id, unsigned char *data, unsigned long data_length)
{
unsigned char size = 4;
Ebml_WriteID(glob, class_id);
Ebml_WriteLen(glob, data_length);
Ebml_Write(glob, data, data_length);

View File

@@ -15,7 +15,7 @@
#include "vpx/vpx_integer.h"
typedef struct EbmlGlobal EbmlGlobal;
void Ebml_Serialize(EbmlGlobal *glob, const void *, unsigned long);
void Ebml_Serialize(EbmlGlobal *glob, const void *, int, unsigned long);
void Ebml_Write(EbmlGlobal *glob, const void *, unsigned long);
/////

View File

@@ -35,11 +35,11 @@ void writeSimpleBlock(EbmlGlobal *glob, unsigned char trackNumber, short timeCod
Ebml_WriteID(glob, SimpleBlock);
unsigned long blockLength = 4 + dataLength;
blockLength |= 0x10000000; //TODO check length < 0x0FFFFFFFF
Ebml_Serialize(glob, &blockLength, 4);
Ebml_Serialize(glob, &blockLength, sizeof(blockLength), 4);
trackNumber |= 0x80; //TODO check track nubmer < 128
Ebml_Write(glob, &trackNumber, 1);
//Ebml_WriteSigned16(glob, timeCode,2); //this is 3 bytes
Ebml_Serialize(glob, &timeCode, 2);
Ebml_Serialize(glob, &timeCode, sizeof(timeCode), 2);
unsigned char flags = 0x00 | (isKeyframe ? 0x80 : 0x00) | (lacingFlag << 1) | discardable;
Ebml_Write(glob, &flags, 1);
Ebml_Write(glob, data, dataLength);

155
libs.mk
View File

@@ -9,7 +9,13 @@
##
ASM:=$(if $(filter yes,$(CONFIG_GCC)),.asm.s,.asm)
# ARM assembly files are written in RVCT-style. We use some make magic to
# filter those files to allow GCC compilation
ifeq ($(ARCH_ARM),yes)
ASM:=$(if $(filter yes,$(CONFIG_GCC)),.asm.s,.asm)
else
ASM:=.asm
endif
CODEC_SRCS-yes += libs.mk
@@ -29,6 +35,7 @@ ifeq ($(CONFIG_VP8_ENCODER),yes)
CODEC_SRCS-yes += $(addprefix $(VP8_PREFIX),$(call enabled,VP8_CX_SRCS))
CODEC_EXPORTS-yes += $(addprefix $(VP8_PREFIX),$(VP8_CX_EXPORTS))
CODEC_SRCS-yes += $(VP8_PREFIX)vp8cx.mk vpx/vp8.h vpx/vp8cx.h vpx/vp8e.h
CODEC_SRCS-$(ARCH_ARM) += $(VP8_PREFIX)vp8cx_arm.mk
INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8e.h include/vpx/vp8cx.h
INSTALL_MAPS += include/vpx/% $(SRC_PATH_BARE)/$(VP8_PREFIX)/%
CODEC_DOC_SRCS += vpx/vp8.h vpx/vp8cx.h
@@ -41,6 +48,7 @@ ifeq ($(CONFIG_VP8_DECODER),yes)
CODEC_SRCS-yes += $(addprefix $(VP8_PREFIX),$(call enabled,VP8_DX_SRCS))
CODEC_EXPORTS-yes += $(addprefix $(VP8_PREFIX),$(VP8_DX_EXPORTS))
CODEC_SRCS-yes += $(VP8_PREFIX)vp8dx.mk vpx/vp8.h vpx/vp8dx.h
CODEC_SRCS-$(ARCH_ARM) += $(VP8_PREFIX)vp8dx_arm.mk
INSTALL-LIBS-yes += include/vpx/vp8.h include/vpx/vp8dx.h
INSTALL_MAPS += include/vpx/% $(SRC_PATH_BARE)/$(VP8_PREFIX)/%
CODEC_DOC_SRCS += vpx/vp8.h vpx/vp8dx.h
@@ -83,6 +91,7 @@ $(eval $(if $(filter universal%,$(TOOLCHAIN)),LIPO_LIBVPX,BUILD_LIBVPX):=yes)
CODEC_SRCS-$(BUILD_LIBVPX) += build/make/version.sh
CODEC_SRCS-$(BUILD_LIBVPX) += vpx/vpx_integer.h
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/asm_offsets.h
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/vpx_timer.h
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/mem.h
CODEC_SRCS-$(BUILD_LIBVPX) += $(BUILD_PFX)vpx_config.c
@@ -94,7 +103,7 @@ CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/x86_abi_support.asm
CODEC_SRCS-$(BUILD_LIBVPX) += vpx_ports/x86_cpuid.c
endif
CODEC_SRCS-$(ARCH_ARM) += vpx_ports/arm_cpudetect.c
CODEC_SRCS-$(ARCH_ARM) += $(BUILD_PFX)vpx_config.asm
CODEC_SRCS-$(ARCH_ARM) += vpx_ports/arm.h
CODEC_EXPORTS-$(BUILD_LIBVPX) += vpx/exports_com
CODEC_EXPORTS-$(CONFIG_ENCODERS) += vpx/exports_enc
CODEC_EXPORTS-$(CONFIG_DECODERS) += vpx/exports_dec
@@ -115,7 +124,7 @@ INSTALL-LIBS-$(CONFIG_SHARED) += $(foreach p,$(VS_PLATFORMS),$(LIBSUBDIR)/$(p)/v
INSTALL-LIBS-$(CONFIG_SHARED) += $(foreach p,$(VS_PLATFORMS),$(LIBSUBDIR)/$(p)/vpx.exp)
endif
else
INSTALL-LIBS-yes += $(LIBSUBDIR)/libvpx.a
INSTALL-LIBS-$(CONFIG_STATIC) += $(LIBSUBDIR)/libvpx.a
INSTALL-LIBS-$(CONFIG_DEBUG_LIBS) += $(LIBSUBDIR)/libvpx_g.a
endif
@@ -123,31 +132,33 @@ CODEC_SRCS=$(call enabled,CODEC_SRCS)
INSTALL-SRCS-$(CONFIG_CODEC_SRCS) += $(CODEC_SRCS)
INSTALL-SRCS-$(CONFIG_CODEC_SRCS) += $(call enabled,CODEC_EXPORTS)
# Generate a list of all enabled sources, in particular for exporting to gyp
# based build systems.
libvpx_srcs.txt:
@echo " [CREATE] $@"
@echo $(CODEC_SRCS) | xargs -n1 echo | sort -u > $@
ifeq ($(CONFIG_EXTERNAL_BUILD),yes)
ifeq ($(CONFIG_MSVS),yes)
ifeq ($(ARCH_ARM),yes)
ifeq ($(HAVE_ARMV5TE),yes)
ARM_ARCH=v5
endif
ifeq ($(HAVE_ARMV6),yes)
ARM_ARCH=v6
endif
obj_int_extract.vcproj: $(SRC_PATH_BARE)/build/make/obj_int_extract.c
@cp $(SRC_PATH_BARE)/build/arm-wince-vs8/obj_int_extract.bat .
@cp $(SRC_PATH_BARE)/build/x86-msvs/obj_int_extract.bat .
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh\
--exe\
--target=$(TOOLCHAIN)\
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh \
--exe \
--target=$(TOOLCHAIN) \
--name=obj_int_extract \
--ver=$(CONFIG_VS_VERSION) \
--proj-guid=E1360C65-D375-4335-8057-7ED99CC3F9B2 \
$(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
--name=obj_int_extract\
--proj-guid=E1360C65-D375-4335-8057-7ED99CC3F9B2\
--out=$@ $^\
-I".&quot;;&quot;$(SRC_PATH_BARE)"
--out=$@ $^ \
-I. \
-I"$(SRC_PATH_BARE)" \
PROJECTS-$(BUILD_LIBVPX) += obj_int_extract.vcproj
PROJECTS-$(BUILD_LIBVPX) += obj_int_extract.bat
endif
vpx.def: $(call enabled,CODEC_EXPORTS)
@echo " [CREATE] $@"
@@ -158,15 +169,16 @@ CLEAN-OBJS += vpx.def
vpx.vcproj: $(CODEC_SRCS) vpx.def
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh\
--lib\
--target=$(TOOLCHAIN)\
$(SRC_PATH_BARE)/build/make/gen_msvs_proj.sh \
--lib \
--target=$(TOOLCHAIN) \
$(if $(CONFIG_STATIC_MSVCRT),--static-crt) \
--name=vpx\
--proj-guid=DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74\
--module-def=vpx.def\
--ver=$(CONFIG_VS_VERSION)\
--out=$@ $(CFLAGS) $^\
--name=vpx \
--proj-guid=DCE19DAF-69AC-46DB-B14A-39F0FAA5DB74 \
--module-def=vpx.def \
--ver=$(CONFIG_VS_VERSION) \
--out=$@ $(CFLAGS) $^ \
--src-path-bare="$(SRC_PATH_BARE)" \
PROJECTS-$(BUILD_LIBVPX) += vpx.vcproj
@@ -176,14 +188,15 @@ endif
else
LIBVPX_OBJS=$(call objs,$(CODEC_SRCS))
OBJS-$(BUILD_LIBVPX) += $(LIBVPX_OBJS)
LIBS-$(BUILD_LIBVPX) += $(BUILD_PFX)libvpx.a $(BUILD_PFX)libvpx_g.a
LIBS-$(if $(BUILD_LIBVPX),$(CONFIG_STATIC)) += $(BUILD_PFX)libvpx.a $(BUILD_PFX)libvpx_g.a
$(BUILD_PFX)libvpx_g.a: $(LIBVPX_OBJS)
BUILD_LIBVPX_SO := $(if $(BUILD_LIBVPX),$(CONFIG_SHARED))
LIBVPX_SO := libvpx.so.$(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)
LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)
LIBS-$(BUILD_LIBVPX_SO) += $(BUILD_PFX)$(LIBVPX_SO)\
$(notdir $(LIBVPX_SO_SYMLINKS))
$(BUILD_PFX)$(LIBVPX_SO): $(LIBVPX_OBJS) libvpx.ver
$(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm -pthread
$(BUILD_PFX)$(LIBVPX_SO): extralibs += -lm
$(BUILD_PFX)$(LIBVPX_SO): SONAME = libvpx.so.$(VERSION_MAJOR)
$(BUILD_PFX)$(LIBVPX_SO): SO_VERSION_SCRIPT = libvpx.ver
LIBVPX_SO_SYMLINKS := $(addprefix $(LIBSUBDIR)/, \
@@ -197,12 +210,41 @@ libvpx.ver: $(call enabled,CODEC_EXPORTS)
$(qexec)echo "local: *; };" >> $@
CLEAN-OBJS += libvpx.ver
$(addprefix $(DIST_DIR)/,$(LIBVPX_SO_SYMLINKS)):
@echo " [LN] $@"
$(qexec)ln -sf $(LIBVPX_SO) $@
define libvpx_symlink_template
$(1): $(2)
@echo " [LN] $$@"
$(qexec)ln -sf $(LIBVPX_SO) $$@
endef
$(eval $(call libvpx_symlink_template,\
$(addprefix $(BUILD_PFX),$(notdir $(LIBVPX_SO_SYMLINKS))),\
$(BUILD_PFX)$(LIBVPX_SO)))
$(eval $(call libvpx_symlink_template,\
$(addprefix $(DIST_DIR)/,$(LIBVPX_SO_SYMLINKS)),\
$(DIST_DIR)/$(LIBSUBDIR)/$(LIBVPX_SO)))
INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBVPX_SO_SYMLINKS)
INSTALL-LIBS-$(CONFIG_SHARED) += $(LIBSUBDIR)/$(LIBVPX_SO)
LIBS-$(BUILD_LIBVPX) += vpx.pc
vpx.pc: config.mk libs.mk
@echo " [CREATE] $@"
$(qexec)echo '# pkg-config file from libvpx $(VERSION_STRING)' > $@
$(qexec)echo 'prefix=$(PREFIX)' >> $@
$(qexec)echo 'exec_prefix=$${prefix}' >> $@
$(qexec)echo 'libdir=$${prefix}/lib' >> $@
$(qexec)echo 'includedir=$${prefix}/include' >> $@
$(qexec)echo '' >> $@
$(qexec)echo 'Name: vpx' >> $@
$(qexec)echo 'Description: WebM Project VPx codec implementation' >> $@
$(qexec)echo 'Version: $(VERSION_MAJOR).$(VERSION_MINOR).$(VERSION_PATCH)' >> $@
$(qexec)echo 'Requires:' >> $@
$(qexec)echo 'Conflicts:' >> $@
$(qexec)echo 'Libs: -L$${libdir} -lvpx' >> $@
$(qexec)echo 'Cflags: -I$${includedir}' >> $@
INSTALL-LIBS-yes += $(LIBSUBDIR)/pkgconfig/vpx.pc
INSTALL_MAPS += $(LIBSUBDIR)/pkgconfig/%.pc %.pc
CLEAN-OBJS += vpx.pc
endif
LIBS-$(LIPO_LIBVPX) += libvpx.a
@@ -230,9 +272,52 @@ endif
#
# Add assembler dependencies for configuration and offsets
#
#$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm $(BUILD_PFX)vpx_asm_offsets.asm
$(filter %.s.o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
$(filter %.asm.o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)vpx_config.asm
#
# Calculate platform- and compiler-specific offsets for hand coded assembly
#
ifeq ($(filter icc gcc,$(TGT_CC)), $(TGT_CC))
$(BUILD_PFX)asm_com_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S
grep EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
$(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S: $(VP8_PREFIX)common/asm_com_offsets.c
CLEAN-OBJS += $(BUILD_PFX)asm_com_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)common/asm_com_offsets.c.S
$(BUILD_PFX)asm_enc_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S
grep EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
$(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S: $(VP8_PREFIX)encoder/asm_enc_offsets.c
CLEAN-OBJS += $(BUILD_PFX)asm_enc_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)encoder/asm_enc_offsets.c.S
$(BUILD_PFX)asm_dec_offsets.asm: $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S
grep EQU $< | tr -d '$$\#' $(ADS2GAS) > $@
$(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S: $(VP8_PREFIX)decoder/asm_dec_offsets.c
CLEAN-OBJS += $(BUILD_PFX)asm_dec_offsets.asm $(BUILD_PFX)$(VP8_PREFIX)decoder/asm_dec_offsets.c.S
else
ifeq ($(filter rvct,$(TGT_CC)), $(TGT_CC))
asm_com_offsets.asm: obj_int_extract
asm_com_offsets.asm: $(VP8_PREFIX)common/asm_com_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)common/asm_com_offsets.c.o
CLEAN-OBJS += asm_com_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_com_offsets.asm
asm_enc_offsets.asm: obj_int_extract
asm_enc_offsets.asm: $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)encoder/asm_enc_offsets.c.o
CLEAN-OBJS += asm_enc_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_enc_offsets.asm
asm_dec_offsets.asm: obj_int_extract
asm_dec_offsets.asm: $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
./obj_int_extract rvds $< $(ADS2GAS) > $@
OBJS-yes += $(VP8_PREFIX)decoder/asm_dec_offsets.c.o
CLEAN-OBJS += asm_dec_offsets.asm
$(filter %$(ASM).o,$(OBJS-yes)): $(BUILD_PFX)asm_dec_offsets.asm
endif
endif
$(shell $(SRC_PATH_BARE)/build/make/version.sh "$(SRC_PATH_BARE)" $(BUILD_PFX)vpx_version.h)
CLEAN-OBJS += $(BUILD_PFX)vpx_version.h

View File

@@ -31,7 +31,7 @@
The WebM project is an open source project supported by its community. For
questions about this SDK, please mail the apps-devel@webmproject.org list.
To contribute, see http://www.webmproject.org/code/contribute and mail
vpx-devel@webmproject.org.
codec-devel@webmproject.org.
*/
/*!\page changelog CHANGELOG

View File

@@ -20,8 +20,6 @@
* Still in the public domain.
*/
#include <sys/types.h> /* for stupid systems */
#include <string.h> /* for memcpy() */
#include "md5_utils.h"

View File

@@ -9,38 +9,13 @@
##
ifeq ($(ARCH_ARM),yes)
ARM_DEVELOP=no
ARM_DEVELOP:=$(if $(filter %vpx.vcproj,$(wildcard *.vcproj)),yes)
ifeq ($(ARM_DEVELOP),yes)
vpx.sln:
@echo " [COPY] $@"
@cp $(SRC_PATH_BARE)/build/arm-wince-vs8/vpx.sln .
PROJECTS-yes += vpx.sln
else
vpx.sln: $(wildcard *.vcproj)
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_sln.sh \
$(if $(filter %vpx.vcproj,$^),--dep=vpxdec:vpx) \
$(if $(filter %vpx.vcproj,$^),--dep=xma:vpx) \
--ver=$(CONFIG_VS_VERSION)\
--target=$(TOOLCHAIN)\
--out=$@ $^
vpx.sln.mk: vpx.sln
@true
PROJECTS-yes += vpx.sln vpx.sln.mk
-include vpx.sln.mk
endif
else
vpx.sln: $(wildcard *.vcproj)
@echo " [CREATE] $@"
$(SRC_PATH_BARE)/build/make/gen_msvs_sln.sh \
$(if $(filter %vpx.vcproj,$^),\
$(foreach vcp,$(filter-out %vpx.vcproj,$^),\
$(foreach vcp,$(filter-out %vpx.vcproj %obj_int_extract.vcproj,$^),\
--dep=$(vcp:.vcproj=):vpx)) \
--dep=vpx:obj_int_extract \
--ver=$(CONFIG_VS_VERSION)\
--out=$@ $^
vpx.sln.mk: vpx.sln
@@ -48,7 +23,6 @@ vpx.sln.mk: vpx.sln
PROJECTS-yes += vpx.sln vpx.sln.mk
-include vpx.sln.mk
endif
# Always install this file, as it is an unconditional post-build rule.
INSTALL_MAPS += src/% $(SRC_PATH_BARE)/%

View File

@@ -7,18 +7,18 @@
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef _VPX_REF_BUILD_PREFIX_h
#define _VPX_REF_BUILD_PREFIX_h
#if defined(__cplusplus)
extern "C" {
#include <stdio.h>
#include "tools_common.h"
#ifdef _WIN32
#include <io.h>
#include <fcntl.h>
#endif
#if defined(__cplusplus)
FILE* set_binary_mode(FILE *stream)
{
(void)stream;
#ifdef _WIN32
_setmode(_fileno(stream), _O_BINARY);
#endif
return stream;
}
#endif
#endif /* include guards */

View File

@@ -7,7 +7,10 @@
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef TOOLS_COMMON_H
#define TOOLS_COMMON_H
/* Sets a stdio stream into binary mode */
FILE* set_binary_mode(FILE *stream);
#define ALLOC_FAILURE -2
#endif

View File

@@ -25,7 +25,7 @@
codec may write into to store details about a single instance of that codec.
Most of the context is implementation specific, and thus opaque to the
application. The context structure as seen by the application is of fixed
size, and thus can be allocated eith with automatic storage or dynamically
size, and thus can be allocated with automatic storage or dynamically
on the heap.
Most operations require an initialized codec context. Codec context
@@ -74,7 +74,7 @@
the ABI is versioned. The ABI version number must be passed at
initialization time to ensure the application is using a header file that
matches the library. The current ABI version number is stored in the
prepropcessor macros #VPX_CODEC_ABI_VERSION, #VPX_ENCODER_ABI_VERSION, and
preprocessor macros #VPX_CODEC_ABI_VERSION, #VPX_ENCODER_ABI_VERSION, and
#VPX_DECODER_ABI_VERSION. For convenience, each initialization function has
a wrapper macro that inserts the correct version number. These macros are
named like the initialization methods, but without the _ver suffix.
@@ -125,7 +125,7 @@
The special value <code>0</code> is reserved to represent an infinite
deadline. In this case, the codec will perform as much processing as
possible to yeild the highest quality frame.
possible to yield the highest quality frame.
By convention, the value <code>1</code> is used to mean "return as fast as
possible."
@@ -135,7 +135,7 @@
/*! \page usage_xma External Memory Allocation
Applications that wish to have fine grained control over how and where
decoders allocate memory \ref MAY make use of the e_xternal Memory Allocation
decoders allocate memory \ref MAY make use of the eXternal Memory Allocation
(XMA) interface. Not all codecs support the XMA \ref usage_features.
To use a decoder in XMA mode, the decoder \ref MUST be initialized with the
@@ -143,7 +143,7 @@
allocate is heavily dependent on the size of the encoded video frames. The
size of the video must be known before requesting the decoder's memory map.
This stream information can be obtained with the vpx_codec_peek_stream_info()
function, which does not require a contructed decoder context. If the exact
function, which does not require a constructed decoder context. If the exact
stream is not known, a stream info structure can be created that reflects
the maximum size that the decoder instance is required to support.
@@ -175,7 +175,7 @@
\section usage_xma_seg_szalign Segment Size and Alignment
The sz (size) and align (alignment) parameters describe the required size
and alignment of the requested segment. Alignment will always be a power of
two. Applications \ref MUST honor the aligment requested. Failure to do so
two. Applications \ref MUST honor the alignment requested. Failure to do so
could result in program crashes or may incur a speed penalty.
\section usage_xma_seg_flags Segment Flags

View File

@@ -16,18 +16,20 @@
#include "findnearmv.h"
#include "entropymode.h"
#include "systemdependent.h"
#include "vpxerrors.h"
extern void vp8_init_scan_order_mask();
void vp8_update_mode_info_border(MODE_INFO *mi, int rows, int cols)
static void update_mode_info_border(MODE_INFO *mi, int rows, int cols)
{
int i;
vpx_memset(mi - cols - 2, 0, sizeof(MODE_INFO) * (cols + 1));
for (i = 0; i < rows; i++)
{
/* TODO(holmer): Bug? This updates the last element of each row
* rather than the border element!
*/
vpx_memset(&mi[i*cols-1], 0, sizeof(MODE_INFO));
}
}
@@ -44,9 +46,11 @@ void vp8_de_alloc_frame_buffers(VP8_COMMON *oci)
vpx_free(oci->above_context);
vpx_free(oci->mip);
vpx_free(oci->prev_mip);
oci->above_context = 0;
oci->mip = 0;
oci->prev_mip = 0;
}
@@ -66,12 +70,12 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
for (i = 0; i < NUM_YV12_BUFFERS; i++)
{
oci->fb_idx_ref_cnt[0] = 0;
oci->fb_idx_ref_cnt[i] = 0;
oci->yv12_fb[i].flags = 0;
if (vp8_yv12_alloc_frame_buffer(&oci->yv12_fb[i], width, height, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return ALLOC_FAILURE;
return 1;
}
}
@@ -88,13 +92,13 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
if (vp8_yv12_alloc_frame_buffer(&oci->temp_scale_frame, width, 16, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return ALLOC_FAILURE;
return 1;
}
if (vp8_yv12_alloc_frame_buffer(&oci->post_proc_buffer, width, height, VP8BORDERINPIXELS) < 0)
{
vp8_de_alloc_frame_buffers(oci);
return ALLOC_FAILURE;
return 1;
}
oci->mb_rows = height >> 4;
@@ -106,21 +110,39 @@ int vp8_alloc_frame_buffers(VP8_COMMON *oci, int width, int height)
if (!oci->mip)
{
vp8_de_alloc_frame_buffers(oci);
return ALLOC_FAILURE;
return 1;
}
oci->mi = oci->mip + oci->mode_info_stride + 1;
/* allocate memory for last frame MODE_INFO array */
#if CONFIG_ERROR_CONCEALMENT
oci->prev_mip = vpx_calloc((oci->mb_cols + 1) * (oci->mb_rows + 1), sizeof(MODE_INFO));
if (!oci->prev_mip)
{
vp8_de_alloc_frame_buffers(oci);
return 1;
}
oci->prev_mi = oci->prev_mip + oci->mode_info_stride + 1;
#else
oci->prev_mip = NULL;
oci->prev_mi = NULL;
#endif
oci->above_context = vpx_calloc(sizeof(ENTROPY_CONTEXT_PLANES) * oci->mb_cols, 1);
if (!oci->above_context)
{
vp8_de_alloc_frame_buffers(oci);
return ALLOC_FAILURE;
return 1;
}
vp8_update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);
update_mode_info_border(oci->mi, oci->mb_rows, oci->mb_cols);
#if CONFIG_ERROR_CONCEALMENT
update_mode_info_border(oci->prev_mi, oci->mb_rows, oci->mb_cols);
#endif
return 0;
}
@@ -130,32 +152,32 @@ void vp8_setup_version(VP8_COMMON *cm)
{
case 0:
cm->no_lpf = 0;
cm->simpler_lpf = 0;
cm->filter_type = NORMAL_LOOPFILTER;
cm->use_bilinear_mc_filter = 0;
cm->full_pixel = 0;
break;
case 1:
cm->no_lpf = 0;
cm->simpler_lpf = 1;
cm->filter_type = SIMPLE_LOOPFILTER;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 0;
break;
case 2:
cm->no_lpf = 1;
cm->simpler_lpf = 0;
cm->filter_type = NORMAL_LOOPFILTER;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 0;
break;
case 3:
cm->no_lpf = 1;
cm->simpler_lpf = 1;
cm->filter_type = SIMPLE_LOOPFILTER;
cm->use_bilinear_mc_filter = 1;
cm->full_pixel = 1;
break;
default:
/*4,5,6,7 are reserved for future use*/
cm->no_lpf = 0;
cm->simpler_lpf = 0;
cm->filter_type = NORMAL_LOOPFILTER;
cm->use_bilinear_mc_filter = 0;
cm->full_pixel = 0;
break;
@@ -170,7 +192,7 @@ void vp8_create_common(VP8_COMMON *oci)
oci->mb_no_coeff_skip = 1;
oci->no_lpf = 0;
oci->simpler_lpf = 0;
oci->filter_type = NORMAL_LOOPFILTER;
oci->use_bilinear_mc_filter = 0;
oci->full_pixel = 0;
oci->multi_token_partition = ONE_PARTITION;

View File

@@ -11,35 +11,30 @@
#include "vpx_ports/config.h"
#include "vpx_ports/arm.h"
#include "g_common.h"
#include "pragmas.h"
#include "subpixel.h"
#include "loopfilter.h"
#include "recon.h"
#include "idct.h"
#include "onyxc_int.h"
extern void (*vp8_build_intra_predictors_mby_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_neon(MACROBLOCKD *x);
extern void (*vp8_build_intra_predictors_mby_s_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_s_neon(MACROBLOCKD *x);
#include "vp8/common/g_common.h"
#include "vp8/common/pragmas.h"
#include "vp8/common/subpixel.h"
#include "vp8/common/loopfilter.h"
#include "vp8/common/recon.h"
#include "vp8/common/idct.h"
#include "vp8/common/onyxc_int.h"
void vp8_arch_arm_common_init(VP8_COMMON *ctx)
{
#if CONFIG_RUNTIME_CPU_DETECT
VP8_COMMON_RTCD *rtcd = &ctx->rtcd;
int flags = arm_cpu_caps();
int has_edsp = flags & HAS_EDSP;
int has_media = flags & HAS_MEDIA;
int has_neon = flags & HAS_NEON;
rtcd->flags = flags;
/* Override default functions with fastest ones for this CPU. */
#if HAVE_ARMV5TE
if (flags & HAS_EDSP)
{
}
#endif
#if HAVE_ARMV6
if (has_media)
if (flags & HAS_MEDIA)
{
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_armv6;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_armv6;
@@ -59,9 +54,11 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
rtcd->loopfilter.normal_b_v = vp8_loop_filter_bv_armv6;
rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_armv6;
rtcd->loopfilter.normal_b_h = vp8_loop_filter_bh_armv6;
rtcd->loopfilter.simple_mb_v = vp8_loop_filter_mbvs_armv6;
rtcd->loopfilter.simple_mb_v =
vp8_loop_filter_simple_vertical_edge_armv6;
rtcd->loopfilter.simple_b_v = vp8_loop_filter_bvs_armv6;
rtcd->loopfilter.simple_mb_h = vp8_loop_filter_mbhs_armv6;
rtcd->loopfilter.simple_mb_h =
vp8_loop_filter_simple_horizontal_edge_armv6;
rtcd->loopfilter.simple_b_h = vp8_loop_filter_bhs_armv6;
rtcd->recon.copy16x16 = vp8_copy_mem16x16_v6;
@@ -74,7 +71,7 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
#endif
#if HAVE_ARMV7
if (has_neon)
if (flags & HAS_NEON)
{
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_neon;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_neon;
@@ -106,31 +103,12 @@ void vp8_arch_arm_common_init(VP8_COMMON *ctx)
rtcd->recon.recon2 = vp8_recon2b_neon;
rtcd->recon.recon4 = vp8_recon4b_neon;
rtcd->recon.recon_mb = vp8_recon_mb_neon;
}
#endif
#endif
#if HAVE_ARMV6
#if CONFIG_RUNTIME_CPU_DETECT
if (has_media)
#endif
{
vp8_build_intra_predictors_mby_ptr = vp8_build_intra_predictors_mby;
vp8_build_intra_predictors_mby_s_ptr = vp8_build_intra_predictors_mby_s;
}
#endif
#if HAVE_ARMV7
#if CONFIG_RUNTIME_CPU_DETECT
if (has_neon)
#endif
{
vp8_build_intra_predictors_mby_ptr =
rtcd->recon.build_intra_predictors_mby =
vp8_build_intra_predictors_mby_neon;
vp8_build_intra_predictors_mby_s_ptr =
rtcd->recon.build_intra_predictors_mby_s =
vp8_build_intra_predictors_mby_s_neon;
}
#endif
#endif
}

View File

@@ -16,10 +16,10 @@
;-------------------------------------
; r0 unsigned char *src_ptr,
; r1 unsigned short *output_ptr,
; r2 unsigned int src_pixels_per_line,
; r3 unsigned int output_height,
; stack unsigned int output_width,
; r1 unsigned short *dst_ptr,
; r2 unsigned int src_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *vp8_filter
;-------------------------------------
; The output is transposed stroed in output array to make it easy for second pass filtering.
@@ -27,21 +27,21 @@
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r4, [sp, #36] ; output width
ldr r4, [sp, #36] ; width
mov r12, r3 ; outer-loop counter
sub r2, r2, r4 ; src increment for height loop
;;IF ARCHITECTURE=6
pld [r0]
;;ENDIF
add r7, r2, r4 ; preload next row
pld [r0, r7]
sub r2, r2, r4 ; src increment for height loop
ldr r5, [r11] ; load up filter coefficients
mov r3, r3, lsl #1 ; output_height*2
mov r3, r3, lsl #1 ; height*2
add r3, r3, #2 ; plus 2 to make output buffer 4-bit aligned since height is actually (height+1)
mov r11, r1 ; save output_ptr for each row
mov r11, r1 ; save dst_ptr for each row
cmp r5, #128 ; if filter coef = 128, then skip the filter
beq bil_null_1st_filter
@@ -96,9 +96,8 @@
add r0, r0, r2 ; move to next input row
subs r12, r12, #1
;;IF ARCHITECTURE=6
pld [r0]
;;ENDIF
add r9, r2, r4, lsl #1 ; adding back block width
pld [r0, r9] ; preload next row
add r11, r11, #2 ; move over to next column
mov r1, r11
@@ -140,17 +139,17 @@
;---------------------------------
; r0 unsigned short *src_ptr,
; r1 unsigned char *output_ptr,
; r2 int output_pitch,
; r3 unsigned int output_height,
; stack unsigned int output_width,
; r1 unsigned char *dst_ptr,
; r2 int dst_pitch,
; r3 unsigned int height,
; stack unsigned int width,
; stack const short *vp8_filter
;---------------------------------
|vp8_filter_block2d_bil_second_pass_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r4, [sp, #36] ; output width
ldr r4, [sp, #36] ; width
ldr r5, [r11] ; load up filter coefficients
mov r12, r4 ; outer-loop counter = width, since we work on transposed data matrix

View File

@@ -22,9 +22,7 @@
;push {r4-r7}
;preload
pld [r0]
pld [r0, r1]
pld [r0, r1, lsl #1]
pld [r0, #31] ; preload for next 16x16 block
ands r4, r0, #15
beq copy_mem16x16_fast
@@ -90,6 +88,8 @@ copy_mem16x16_1_loop
ldrneb r6, [r0, #2]
ldrneb r7, [r0, #3]
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_1_loop
ldmia sp!, {r4 - r7}
@@ -121,6 +121,8 @@ copy_mem16x16_4_loop
ldrne r6, [r0, #8]
ldrne r7, [r0, #12]
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_4_loop
ldmia sp!, {r4 - r7}
@@ -148,6 +150,7 @@ copy_mem16x16_8_loop
add r2, r2, r3
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_8_loop
ldmia sp!, {r4 - r7}
@@ -171,6 +174,7 @@ copy_mem16x16_fast_loop
;stm r2, {r4-r7}
add r2, r2, r3
pld [r0, #31] ; preload for next 16x16 block
bne copy_mem16x16_fast_loop
ldmia sp!, {r4 - r7}

View File

@@ -10,6 +10,8 @@
EXPORT |vp8_filter_block2d_first_pass_armv6|
EXPORT |vp8_filter_block2d_first_pass_16x16_armv6|
EXPORT |vp8_filter_block2d_first_pass_8x8_armv6|
EXPORT |vp8_filter_block2d_second_pass_armv6|
EXPORT |vp8_filter4_block2d_second_pass_armv6|
EXPORT |vp8_filter_block2d_first_pass_only_armv6|
@@ -40,11 +42,6 @@
add r12, r3, #16 ; square off the output
sub sp, sp, #4
;;IF ARCHITECTURE=6
;pld [r0, #-2]
;;pld [r0, #30]
;;ENDIF
ldr r4, [r11] ; load up packed filter coefficients
ldr r5, [r11, #4]
ldr r6, [r11, #8]
@@ -101,15 +98,10 @@
bne width_loop_1st_6
;;add r9, r2, #30 ; attempt to load 2 adjacent cache lines
;;IF ARCHITECTURE=6
;pld [r0, r2]
;;pld [r0, r9]
;;ENDIF
ldr r1, [sp] ; load and update dst address
subs r7, r7, #0x10000
add r0, r0, r2 ; move to next input line
add r1, r1, #2 ; move over to next column
str r1, [sp]
@@ -120,6 +112,192 @@
ENDP
; --------------------------
; 16x16 version
; -----------------------------
|vp8_filter_block2d_first_pass_16x16_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r7, [sp, #36] ; output height
add r4, r2, #18 ; preload next low
pld [r0, r4]
sub r2, r2, r3 ; inside loop increments input array,
; so the height loop only needs to add
; r2 - width to the input pointer
mov r3, r3, lsl #1 ; multiply width by 2 because using shorts
add r12, r3, #16 ; square off the output
sub sp, sp, #4
ldr r4, [r11] ; load up packed filter coefficients
ldr r5, [r11, #4]
ldr r6, [r11, #8]
str r1, [sp] ; push destination to stack
mov r7, r7, lsl #16 ; height is top part of counter
; six tap filter
|height_loop_1st_16_6|
ldrb r8, [r0, #-2] ; load source data
ldrb r9, [r0, #-1]
ldrb r10, [r0], #2
orr r7, r7, r3, lsr #2 ; construct loop counter
|width_loop_1st_16_6|
ldrb r11, [r0, #-1]
pkhbt lr, r8, r9, lsl #16 ; r9 | r8
pkhbt r8, r9, r10, lsl #16 ; r10 | r9
ldrb r9, [r0]
smuad lr, lr, r4 ; apply the filter
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smuad r8, r8, r4
pkhbt r11, r11, r9, lsl #16 ; r9 | r11
smlad lr, r10, r5, lr
ldrb r10, [r0, #1]
smlad r8, r11, r5, r8
ldrb r11, [r0, #2]
sub r7, r7, #1
pkhbt r9, r9, r10, lsl #16 ; r10 | r9
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smlad lr, r9, r6, lr
smlad r11, r10, r6, r8
ands r10, r7, #0xff ; test loop counter
add lr, lr, #0x40 ; round_shift_and_clamp
ldrneb r8, [r0, #-2] ; load data for next loop
usat lr, #8, lr, asr #7
add r11, r11, #0x40
ldrneb r9, [r0, #-1]
usat r11, #8, r11, asr #7
strh lr, [r1], r12 ; result is transposed and stored, which
; will make second pass filtering easier.
ldrneb r10, [r0], #2
strh r11, [r1], r12
bne width_loop_1st_16_6
ldr r1, [sp] ; load and update dst address
subs r7, r7, #0x10000
add r0, r0, r2 ; move to next input line
add r11, r2, #34 ; adding back block width(=16)
pld [r0, r11] ; preload next low
add r1, r1, #2 ; move over to next column
str r1, [sp]
bne height_loop_1st_16_6
add sp, sp, #4
ldmia sp!, {r4 - r11, pc}
ENDP
; --------------------------
; 8x8 version
; -----------------------------
|vp8_filter_block2d_first_pass_8x8_armv6| PROC
stmdb sp!, {r4 - r11, lr}
ldr r11, [sp, #40] ; vp8_filter address
ldr r7, [sp, #36] ; output height
add r4, r2, #10 ; preload next low
pld [r0, r4]
sub r2, r2, r3 ; inside loop increments input array,
; so the height loop only needs to add
; r2 - width to the input pointer
mov r3, r3, lsl #1 ; multiply width by 2 because using shorts
add r12, r3, #16 ; square off the output
sub sp, sp, #4
ldr r4, [r11] ; load up packed filter coefficients
ldr r5, [r11, #4]
ldr r6, [r11, #8]
str r1, [sp] ; push destination to stack
mov r7, r7, lsl #16 ; height is top part of counter
; six tap filter
|height_loop_1st_8_6|
ldrb r8, [r0, #-2] ; load source data
ldrb r9, [r0, #-1]
ldrb r10, [r0], #2
orr r7, r7, r3, lsr #2 ; construct loop counter
|width_loop_1st_8_6|
ldrb r11, [r0, #-1]
pkhbt lr, r8, r9, lsl #16 ; r9 | r8
pkhbt r8, r9, r10, lsl #16 ; r10 | r9
ldrb r9, [r0]
smuad lr, lr, r4 ; apply the filter
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smuad r8, r8, r4
pkhbt r11, r11, r9, lsl #16 ; r9 | r11
smlad lr, r10, r5, lr
ldrb r10, [r0, #1]
smlad r8, r11, r5, r8
ldrb r11, [r0, #2]
sub r7, r7, #1
pkhbt r9, r9, r10, lsl #16 ; r10 | r9
pkhbt r10, r10, r11, lsl #16 ; r11 | r10
smlad lr, r9, r6, lr
smlad r11, r10, r6, r8
ands r10, r7, #0xff ; test loop counter
add lr, lr, #0x40 ; round_shift_and_clamp
ldrneb r8, [r0, #-2] ; load data for next loop
usat lr, #8, lr, asr #7
add r11, r11, #0x40
ldrneb r9, [r0, #-1]
usat r11, #8, r11, asr #7
strh lr, [r1], r12 ; result is transposed and stored, which
; will make second pass filtering easier.
ldrneb r10, [r0], #2
strh r11, [r1], r12
bne width_loop_1st_8_6
ldr r1, [sp] ; load and update dst address
subs r7, r7, #0x10000
add r0, r0, r2 ; move to next input line
add r11, r2, #18 ; adding back block width(=8)
pld [r0, r11] ; preload next low
add r1, r1, #2 ; move over to next column
str r1, [sp]
bne height_loop_1st_8_6
add sp, sp, #4
ldmia sp!, {r4 - r11, pc}
ENDP
;---------------------------------
; r0 short *src_ptr,
; r1 unsigned char *output_ptr,
@@ -262,6 +440,10 @@
|vp8_filter_block2d_first_pass_only_armv6| PROC
stmdb sp!, {r4 - r11, lr}
add r7, r2, r3 ; preload next low
add r7, r7, #2
pld [r0, r7]
ldr r4, [sp, #36] ; output pitch
ldr r11, [sp, #40] ; HFilter address
sub sp, sp, #8
@@ -330,16 +512,15 @@
bne width_loop_1st_only_6
;;add r9, r2, #30 ; attempt to load 2 adjacent cache lines
;;IF ARCHITECTURE=6
;pld [r0, r2]
;;pld [r0, r9]
;;ENDIF
ldr lr, [sp] ; load back output pitch
ldr r12, [sp, #4] ; load back output pitch
subs r7, r7, #1
add r0, r0, r12 ; updata src for next loop
add r11, r12, r3 ; preload next low
add r11, r11, #2
pld [r0, r11]
add r1, r1, lr ; update dst for next loop
bne height_loop_1st_only_6

View File

@@ -53,14 +53,11 @@ count RN r5
;r0 unsigned char *src_ptr,
;r1 int src_pixel_step,
;r2 const char *flimit,
;r2 const char *blimit,
;r3 const char *limit,
;stack const char *thresh,
;stack int count
;Note: All 16 elements in flimit are equal. So, in the code, only one load is needed
;for flimit. Same way applies to limit and thresh.
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
|vp8_loop_filter_horizontal_edge_armv6| PROC
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
@@ -72,14 +69,18 @@ count RN r5
sub sp, sp, #16 ; create temp buffer
ldr r9, [src], pstep ; p3
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
ldr r10, [src], pstep ; p2
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
ldr r11, [src], pstep ; p1
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r6], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r6] ; thresh
orr r2, r2, r2, lsl #8
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|Hnext8|
; vp8_filter_mask() function
@@ -253,12 +254,6 @@ count RN r5
subs count, count, #1
;pld [src]
;pld [src, pstep]
;pld [src, pstep, lsl #1]
;pld [src, pstep, lsl #2]
;pld [src, pstep, lsl #3]
ldrne r9, [src], pstep ; p3
ldrne r10, [src], pstep ; p2
ldrne r11, [src], pstep ; p1
@@ -281,14 +276,18 @@ count RN r5
sub sp, sp, #16 ; create temp buffer
ldr r9, [src], pstep ; p3
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
ldr r10, [src], pstep ; p2
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
ldr r11, [src], pstep ; p1
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r6], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r6] ; thresh
orr r2, r2, r2, lsl #8
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|MBHnext8|
@@ -590,15 +589,19 @@ count RN r5
sub sp, sp, #16 ; create temp buffer
ldr r6, [src], pstep ; load source data
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
ldr r7, [src], pstep
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
ldr r8, [src], pstep
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r12], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r12] ; thresh
orr r2, r2, r2, lsl #8
ldr lr, [src], pstep
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|Vnext8|
@@ -857,18 +860,26 @@ count RN r5
sub src, src, #4 ; move src pointer down by 4
ldr count, [sp, #40] ; count for 8-in-parallel
ldr r12, [sp, #36] ; load thresh address
pld [src, #23] ; preload for next block
sub sp, sp, #16 ; create temp buffer
ldr r6, [src], pstep ; load source data
ldr r4, [r2], #4 ; flimit
ldrb r4, [r2] ; blimit
pld [src, #23]
ldr r7, [src], pstep
ldr r2, [r3], #4 ; limit
ldrb r2, [r3] ; limit
pld [src, #23]
ldr r8, [src], pstep
uadd8 r4, r4, r4 ; flimit * 2
ldr r3, [r12], #4 ; thresh
orr r4, r4, r4, lsl #8
ldrb r3, [r12] ; thresh
orr r2, r2, r2, lsl #8
pld [src, #23]
ldr lr, [src], pstep
mov count, count, lsl #1 ; 4-in-parallel
uadd8 r4, r4, r2 ; flimit * 2 + limit
orr r4, r4, r4, lsl #16
orr r3, r3, r3, lsl #8
orr r2, r2, r2, lsl #16
orr r3, r3, r3, lsl #16
|MBVnext8|
; vp8_filter_mask() function
@@ -908,6 +919,7 @@ count RN r5
str lr, [sp, #8]
ldr lr, [src], pstep
TRANSPOSE_MATRIX r6, r7, r8, lr, r9, r10, r11, r12
ldr lr, [sp, #8] ; load back (f)limit accumulator
@@ -956,6 +968,7 @@ count RN r5
beq mbvskip_filter ; skip filtering
;vp8_hevmask() function
;calculate high edge variance
@@ -1123,6 +1136,7 @@ count RN r5
smlabb r8, r6, lr, r7
smlatb r6, r6, lr, r7
smlabb r9, r10, lr, r7
smlatb r10, r10, lr, r7
ssat r8, #8, r8, asr #7
ssat r6, #8, r6, asr #7
@@ -1242,9 +1256,13 @@ count RN r5
sub src, src, #4
subs count, count, #1
pld [src, #23] ; preload for next block
ldrne r6, [src], pstep ; load source data
pld [src, #23]
ldrne r7, [src], pstep
pld [src, #23]
ldrne r8, [src], pstep
pld [src, #23]
ldrne lr, [src], pstep
bne MBVnext8

View File

@@ -45,35 +45,28 @@
MEND
src RN r0
pstep RN r1
;r0 unsigned char *src_ptr,
;r1 int src_pixel_step,
;r2 const char *flimit,
;r3 const char *limit,
;stack const char *thresh,
;stack int count
; All 16 elements in flimit are equal. So, in the code, only one load is needed
; for flimit. Same applies to limit. thresh is not used in simple looopfilter
;r2 const char *blimit
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
|vp8_loop_filter_simple_horizontal_edge_armv6| PROC
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
stmdb sp!, {r4 - r11, lr}
ldr r12, [r3] ; limit
ldrb r12, [r2] ; blimit
ldr r3, [src, -pstep, lsl #1] ; p1
ldr r4, [src, -pstep] ; p0
ldr r5, [src] ; q0
ldr r6, [src, pstep] ; q1
ldr r7, [r2] ; flimit
orr r12, r12, r12, lsl #8 ; blimit
ldr r2, c0x80808080
ldr r9, [sp, #40] ; count for 8-in-parallel
uadd8 r7, r7, r7 ; flimit * 2
mov r9, r9, lsl #1 ; double the count. we're doing 4 at a time
uadd8 r12, r7, r12 ; flimit * 2 + limit
orr r12, r12, r12, lsl #16 ; blimit
mov r9, #4 ; double the count. we're doing 4 at a time
mov lr, #0 ; need 0 in a couple places
|simple_hnext8|
@@ -148,30 +141,32 @@ pstep RN r1
;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
stmdb sp!, {r4 - r11, lr}
ldr r12, [r2] ; r12: flimit
ldrb r12, [r2] ; r12: blimit
ldr r2, c0x80808080
ldr r7, [r3] ; limit
orr r12, r12, r12, lsl #8
; load soure data to r7, r8, r9, r10
ldrh r3, [src, #-2]
pld [src, #23] ; preload for next block
ldrh r4, [src], pstep
uadd8 r12, r12, r12 ; flimit * 2
orr r12, r12, r12, lsl #16
ldrh r5, [src, #-2]
pld [src, #23]
ldrh r6, [src], pstep
uadd8 r12, r12, r7 ; flimit * 2 + limit
pkhbt r7, r3, r4, lsl #16
ldrh r3, [src, #-2]
pld [src, #23]
ldrh r4, [src], pstep
ldr r11, [sp, #40] ; count (r11) for 8-in-parallel
pkhbt r8, r5, r6, lsl #16
ldrh r5, [src, #-2]
pld [src, #23]
ldrh r6, [src], pstep
mov r11, r11, lsl #1 ; 4-in-parallel
mov r11, #4 ; double the count. we're doing 4 at a time
|simple_vnext8|
; vp8_simple_filter_mask() function
@@ -259,19 +254,23 @@ pstep RN r1
; load soure data to r7, r8, r9, r10
ldrneh r3, [src, #-2]
pld [src, #23] ; preload for next block
ldrneh r4, [src], pstep
ldrneh r5, [src, #-2]
pld [src, #23]
ldrneh r6, [src], pstep
pkhbt r7, r3, r4, lsl #16
ldrneh r3, [src, #-2]
pld [src, #23]
ldrneh r4, [src], pstep
pkhbt r8, r5, r6, lsl #16
ldrneh r5, [src, #-2]
pld [src, #23]
ldrneh r6, [src], pstep
bne simple_vnext8

View File

@@ -32,9 +32,12 @@
beq skip_firstpass_filter
;first-pass filter
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
sub r0, r0, r1, lsl #1
add r3, r1, #10 ; preload next low
pld [r0, r3]
add r2, r12, r2, lsl #4 ;calculate filter location
add r0, r0, #3 ;adjust src only for loading convinience
@@ -110,6 +113,9 @@
add r0, r0, r1 ; move to next input line
add r11, r1, #18 ; preload next low. adding back block width(=8), which is subtracted earlier
pld [r0, r11]
bne first_pass_hloop_v6
;second pass filter
@@ -121,7 +127,7 @@ secondpass_filter
cmp r3, #0
beq skip_secondpass_filter
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
add lr, r12, r3, lsl #4 ;calculate filter location
mov r2, #0x00080000
@@ -243,12 +249,8 @@ skip_secondpass_hloop
ENDP
;-----------------
AREA subpelfilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_
DCD filter8_coeff
filter8_coeff
DCD 0x00000000, 0x00000080, 0x00000000, 0x00000000
DCD 0xfffa0000, 0x000c007b, 0x0000ffff, 0x00000000

View File

@@ -10,112 +10,15 @@
#include <math.h>
#include "subpixel.h"
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
static const short bilinear_filters[8][2] =
{
{ 128, 0 },
{ 112, 16 },
{ 96, 32 },
{ 80, 48 },
{ 64, 64 },
{ 48, 80 },
{ 32, 96 },
{ 16, 112 }
};
extern void vp8_filter_block2d_bil_first_pass_armv6
(
unsigned char *src_ptr,
unsigned short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
);
extern void vp8_filter_block2d_bil_second_pass_armv6
(
unsigned short *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
);
#if 0
void vp8_filter_block2d_bil_first_pass_6
(
unsigned char *src_ptr,
unsigned short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i, j;
for ( i=0; i<output_height; i++ )
{
for ( j=0; j<output_width; j++ )
{
/* Apply bilinear filter */
output_ptr[j] = ( ( (int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[1] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT/2) ) >> VP8_FILTER_SHIFT;
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_width;
}
}
void vp8_filter_block2d_bil_second_pass_6
(
unsigned short *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int output_height,
unsigned int output_width,
const short *vp8_filter
)
{
unsigned int i,j;
int Temp;
for ( i=0; i<output_height; i++ )
{
for ( j=0; j<output_width; j++ )
{
/* Apply filter */
Temp = ((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[output_width] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT/2);
output_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
src_ptr++;
}
/* Next row... */
/*src_ptr += src_pixels_per_line - output_width;*/
output_ptr += output_pitch;
}
}
#endif
#include "vp8/common/filter.h"
#include "vp8/common/subpixel.h"
#include "bilinearfilter_arm.h"
void vp8_filter_block2d_bil_armv6
(
unsigned char *src_ptr,
unsigned char *output_ptr,
unsigned int src_pixels_per_line,
unsigned char *dst_ptr,
unsigned int src_pitch,
unsigned int dst_pitch,
const short *HFilter,
const short *VFilter,
@@ -123,15 +26,13 @@ void vp8_filter_block2d_bil_armv6
int Height
)
{
unsigned short FData[36*16]; /* Temp data bufffer used in filtering */
unsigned short FData[36*16]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
/* pixel_step = 1; */
vp8_filter_block2d_bil_first_pass_armv6(src_ptr, FData, src_pixels_per_line, Height + 1, Width, HFilter);
vp8_filter_block2d_bil_first_pass_armv6(src_ptr, FData, src_pitch, Height + 1, Width, HFilter);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass_armv6(FData, output_ptr, dst_pitch, Height, Width, VFilter);
vp8_filter_block2d_bil_second_pass_armv6(FData, dst_ptr, dst_pitch, Height, Width, VFilter);
}
@@ -148,8 +49,8 @@ void vp8_bilinear_predict4x4_armv6
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
}
@@ -167,8 +68,8 @@ void vp8_bilinear_predict8x8_armv6
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
}
@@ -186,8 +87,8 @@ void vp8_bilinear_predict8x4_armv6
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
}
@@ -205,8 +106,8 @@ void vp8_bilinear_predict16x16_armv6
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil_armv6(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
}

View File

@@ -0,0 +1,35 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef BILINEARFILTER_ARM_H
#define BILINEARFILTER_ARM_H
extern void vp8_filter_block2d_bil_first_pass_armv6
(
const unsigned char *src_ptr,
unsigned short *dst_ptr,
unsigned int src_pitch,
unsigned int height,
unsigned int width,
const short *vp8_filter
);
extern void vp8_filter_block2d_bil_second_pass_armv6
(
const unsigned short *src_ptr,
unsigned char *dst_ptr,
int dst_pitch,
unsigned int height,
unsigned int width,
const short *vp8_filter
);
#endif /* BILINEARFILTER_ARM_H */

View File

@@ -11,26 +11,10 @@
#include "vpx_ports/config.h"
#include <math.h>
#include "subpixel.h"
#include "vp8/common/filter.h"
#include "vp8/common/subpixel.h"
#include "vpx_ports/mem.h"
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
DECLARE_ALIGNED(16, static const short, sub_pel_filters[8][6]) =
{
{ 0, 0, 128, 0, 0, 0 }, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
{ 0, -6, 123, 12, -1, 0 },
{ 2, -11, 108, 36, -8, 1 }, /* New 1/4 pel 6 tap filter */
{ 0, -9, 93, 50, -6, 0 },
{ 3, -16, 77, 77, -16, 3 }, /* New 1/2 pel 6 tap filter */
{ 0, -6, 50, 93, -9, 0 },
{ 1, -8, 36, 108, -11, 2 }, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0 },
};
extern void vp8_filter_block2d_first_pass_armv6
(
unsigned char *src_ptr,
@@ -41,6 +25,28 @@ extern void vp8_filter_block2d_first_pass_armv6
const short *vp8_filter
);
// 8x8
extern void vp8_filter_block2d_first_pass_8x8_armv6
(
unsigned char *src_ptr,
short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_width,
unsigned int output_height,
const short *vp8_filter
);
// 16x16
extern void vp8_filter_block2d_first_pass_16x16_armv6
(
unsigned char *src_ptr,
short *output_ptr,
unsigned int src_pixels_per_line,
unsigned int output_width,
unsigned int output_height,
const short *vp8_filter
);
extern void vp8_filter_block2d_second_pass_armv6
(
short *src_ptr,
@@ -93,11 +99,11 @@ void vp8_sixtap_predict_armv6
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 12*4); /* Temp data bufffer used in filtering */
DECLARE_ALIGNED_ARRAY(4, short, FData, 12*4); /* Temp data buffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* Vfilter is null. First pass only */
if (xoffset && !yoffset)
@@ -129,47 +135,6 @@ void vp8_sixtap_predict_armv6
}
}
#if 0
void vp8_sixtap_predict8x4_armv6
(
unsigned char *src_ptr,
int src_pixels_per_line,
int xoffset,
int yoffset,
unsigned char *dst_ptr,
int dst_pitch
)
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data bufffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
/*if (xoffset && !yoffset)
{
vp8_filter_block2d_first_pass_only_armv6 ( src_ptr, dst_ptr, src_pixels_per_line, 8, dst_pitch, HFilter );
}*/
/* Hfilter is null. Second pass only */
/*else if (!xoffset && yoffset)
{
vp8_filter_block2d_second_pass_only_armv6 ( src_ptr, dst_ptr, src_pixels_per_line, 8, dst_pitch, VFilter );
}
else
{
if (yoffset & 0x1)
vp8_filter_block2d_first_pass_armv6 ( src_ptr-src_pixels_per_line, FData+1, src_pixels_per_line, 8, 7, HFilter );
else*/
vp8_filter_block2d_first_pass_armv6 ( src_ptr-(2*src_pixels_per_line), FData, src_pixels_per_line, 8, 9, HFilter );
vp8_filter_block2d_second_pass_armv6 ( FData+2, dst_ptr, dst_pitch, 4, 8, VFilter );
/*}*/
}
#endif
void vp8_sixtap_predict8x8_armv6
(
unsigned char *src_ptr,
@@ -182,10 +147,10 @@ void vp8_sixtap_predict8x8_armv6
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data bufffer used in filtering */
DECLARE_ALIGNED_ARRAY(4, short, FData, 16*8); /* Temp data buffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
if (xoffset && !yoffset)
{
@@ -200,12 +165,12 @@ void vp8_sixtap_predict8x8_armv6
{
if (yoffset & 0x1)
{
vp8_filter_block2d_first_pass_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 8, 11, HFilter);
vp8_filter_block2d_first_pass_8x8_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 8, 11, HFilter);
vp8_filter4_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 8, VFilter);
}
else
{
vp8_filter_block2d_first_pass_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 8, 13, HFilter);
vp8_filter_block2d_first_pass_8x8_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 8, 13, HFilter);
vp8_filter_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 8, VFilter);
}
}
@@ -224,10 +189,10 @@ void vp8_sixtap_predict16x16_armv6
{
const short *HFilter;
const short *VFilter;
DECLARE_ALIGNED_ARRAY(4, short, FData, 24*16); /* Temp data bufffer used in filtering */
DECLARE_ALIGNED_ARRAY(4, short, FData, 24*16); /* Temp data buffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
if (xoffset && !yoffset)
{
@@ -242,12 +207,12 @@ void vp8_sixtap_predict16x16_armv6
{
if (yoffset & 0x1)
{
vp8_filter_block2d_first_pass_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 16, 19, HFilter);
vp8_filter_block2d_first_pass_16x16_armv6(src_ptr - src_pixels_per_line, FData + 1, src_pixels_per_line, 16, 19, HFilter);
vp8_filter4_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 16, VFilter);
}
else
{
vp8_filter_block2d_first_pass_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 16, 21, HFilter);
vp8_filter_block2d_first_pass_16x16_armv6(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 16, 21, HFilter);
vp8_filter_block2d_second_pass_armv6(FData + 2, dst_ptr, dst_pitch, 16, VFilter);
}
}

View File

@@ -9,135 +9,107 @@
*/
#include "vpx_ports/config.h"
#include <math.h>
#include "loopfilter.h"
#include "onyxc_int.h"
#include "vpx_config.h"
#include "vp8/common/loopfilter.h"
#include "vp8/common/onyxc_int.h"
#if HAVE_ARMV6
extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_vertical_edge_armv6);
extern prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_mbloop_filter_vertical_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_simple_horizontal_edge_armv6);
extern prototype_loopfilter(vp8_loop_filter_simple_vertical_edge_armv6);
#endif
extern prototype_loopfilter(vp8_loop_filter_horizontal_edge_y_neon);
extern prototype_loopfilter(vp8_loop_filter_vertical_edge_y_neon);
extern prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_y_neon);
extern prototype_loopfilter(vp8_mbloop_filter_vertical_edge_y_neon);
extern prototype_loopfilter(vp8_loop_filter_simple_horizontal_edge_neon);
extern prototype_loopfilter(vp8_loop_filter_simple_vertical_edge_neon);
#if HAVE_ARMV7
typedef void loopfilter_y_neon(unsigned char *src, int pitch,
unsigned char blimit, unsigned char limit, unsigned char thresh);
typedef void loopfilter_uv_neon(unsigned char *u, int pitch,
unsigned char blimit, unsigned char limit, unsigned char thresh,
unsigned char *v);
extern loop_filter_uvfunction vp8_loop_filter_horizontal_edge_uv_neon;
extern loop_filter_uvfunction vp8_loop_filter_vertical_edge_uv_neon;
extern loop_filter_uvfunction vp8_mbloop_filter_horizontal_edge_uv_neon;
extern loop_filter_uvfunction vp8_mbloop_filter_vertical_edge_uv_neon;
extern loopfilter_y_neon vp8_loop_filter_horizontal_edge_y_neon;
extern loopfilter_y_neon vp8_loop_filter_vertical_edge_y_neon;
extern loopfilter_y_neon vp8_mbloop_filter_horizontal_edge_y_neon;
extern loopfilter_y_neon vp8_mbloop_filter_vertical_edge_y_neon;
extern loopfilter_uv_neon vp8_loop_filter_horizontal_edge_uv_neon;
extern loopfilter_uv_neon vp8_loop_filter_vertical_edge_uv_neon;
extern loopfilter_uv_neon vp8_mbloop_filter_horizontal_edge_uv_neon;
extern loopfilter_uv_neon vp8_mbloop_filter_vertical_edge_uv_neon;
#endif
#if HAVE_ARMV6
/*ARMV6 loopfilter functions*/
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_horizontal_edge_armv6(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
vp8_mbloop_filter_horizontal_edge_armv6(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_horizontal_edge_armv6(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_vertical_edge_armv6(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
vp8_mbloop_filter_vertical_edge_armv6(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_vertical_edge_armv6(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_horizontal_edge_armv6(u_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_horizontal_edge_armv6(v_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bhs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bhs_armv6(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 4 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 8 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_armv6(y_ptr + 12 * y_stride, y_stride, blimit);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_armv6(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_vertical_edge_armv6(u_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_vertical_edge_armv6(v_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 4, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 8, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_armv6(y_ptr + 12, y_stride, blimit);
}
#endif
@@ -145,93 +117,60 @@ void vp8_loop_filter_bvs_armv6(unsigned char *y_ptr, unsigned char *u_ptr, unsig
/* NEON loopfilter functions */
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
unsigned char mblim = *lfi->mblim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_mbloop_filter_horizontal_edge_y_neon(y_ptr, y_stride, mblim, lim, hev_thr);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, v_ptr);
}
void vp8_loop_filter_mbhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_horizontal_edge_uv_neon(u_ptr, uv_stride, mblim, lim, hev_thr, v_ptr);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
unsigned char mblim = *lfi->mblim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_mbloop_filter_vertical_edge_y_neon(y_ptr, y_stride, mblim, lim, hev_thr);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, v_ptr);
}
void vp8_loop_filter_mbvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_neon(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_vertical_edge_uv_neon(u_ptr, uv_stride, mblim, lim, hev_thr, v_ptr);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
unsigned char blim = *lfi->blim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 4 * y_stride, y_stride, blim, lim, hev_thr);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 8 * y_stride, y_stride, blim, lim, hev_thr);
vp8_loop_filter_horizontal_edge_y_neon(y_ptr + 12 * y_stride, y_stride, blim, lim, hev_thr);
if (u_ptr)
vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, v_ptr + 4 * uv_stride);
}
void vp8_loop_filter_bhs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_neon(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_uv_neon(u_ptr + 4 * uv_stride, uv_stride, blim, lim, hev_thr, v_ptr + 4 * uv_stride);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
int y_stride, int uv_stride, loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
unsigned char blim = *lfi->blim;
unsigned char lim = *lfi->lim;
unsigned char hev_thr = *lfi->hev_thr;
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 4, y_stride, blim, lim, hev_thr);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 8, y_stride, blim, lim, hev_thr);
vp8_loop_filter_vertical_edge_y_neon(y_ptr + 12, y_stride, blim, lim, hev_thr);
if (u_ptr)
vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, v_ptr + 4);
}
void vp8_loop_filter_bvs_neon(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_neon(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_neon(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_neon(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_uv_neon(u_ptr + 4, uv_stride, blim, lim, hev_thr, v_ptr + 4);
}
#endif

View File

@@ -12,15 +12,17 @@
#ifndef LOOPFILTER_ARM_H
#define LOOPFILTER_ARM_H
#include "vpx_config.h"
#if HAVE_ARMV6
extern prototype_loopfilter_block(vp8_loop_filter_mbv_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bv_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_mbh_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bh_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_mbvs_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bvs_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_mbhs_armv6);
extern prototype_loopfilter_block(vp8_loop_filter_bhs_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_bvs_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_bhs_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_armv6);
extern prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_armv6);
#if !CONFIG_RUNTIME_CPU_DETECT
#undef vp8_lf_normal_mb_v
@@ -36,28 +38,29 @@ extern prototype_loopfilter_block(vp8_loop_filter_bhs_armv6);
#define vp8_lf_normal_b_h vp8_loop_filter_bh_armv6
#undef vp8_lf_simple_mb_v
#define vp8_lf_simple_mb_v vp8_loop_filter_mbvs_armv6
#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_armv6
#undef vp8_lf_simple_b_v
#define vp8_lf_simple_b_v vp8_loop_filter_bvs_armv6
#undef vp8_lf_simple_mb_h
#define vp8_lf_simple_mb_h vp8_loop_filter_mbhs_armv6
#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_armv6
#undef vp8_lf_simple_b_h
#define vp8_lf_simple_b_h vp8_loop_filter_bhs_armv6
#endif
#endif
#endif /* !CONFIG_RUNTIME_CPU_DETECT */
#endif /* HAVE_ARMV6 */
#if HAVE_ARMV7
extern prototype_loopfilter_block(vp8_loop_filter_mbv_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bv_neon);
extern prototype_loopfilter_block(vp8_loop_filter_mbh_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bh_neon);
extern prototype_loopfilter_block(vp8_loop_filter_mbvs_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bvs_neon);
extern prototype_loopfilter_block(vp8_loop_filter_mbhs_neon);
extern prototype_loopfilter_block(vp8_loop_filter_bhs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_mbvs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_bvs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_mbhs_neon);
extern prototype_simple_loopfilter(vp8_loop_filter_bhs_neon);
#if !CONFIG_RUNTIME_CPU_DETECT
#undef vp8_lf_normal_mb_v
@@ -83,7 +86,8 @@ extern prototype_loopfilter_block(vp8_loop_filter_bhs_neon);
#undef vp8_lf_simple_b_h
#define vp8_lf_simple_b_h vp8_loop_filter_bhs_neon
#endif
#endif
#endif /* !CONFIG_RUNTIME_CPU_DETECT */
#endif
#endif /* HAVE_ARMV7 */
#endif /* LOOPFILTER_ARM_H */

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict16x16_neon| PROC
push {r4-r5, lr}
ldr r12, _bifilter16_coeff_
adr r12, bifilter16_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -350,12 +350,7 @@ filt_blk2d_spo16x16_loop_neon
ENDP
;-----------------
AREA bifilters16_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter16_coeff_
DCD bifilter16_coeff
bifilter16_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict4x4_neon| PROC
push {r4, lr}
ldr r12, _bifilter4_coeff_
adr r12, bifilter4_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -123,12 +123,7 @@ skip_secondpass_filter
ENDP
;-----------------
AREA bilinearfilters4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter4_coeff_
DCD bifilter4_coeff
bifilter4_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict8x4_neon| PROC
push {r4, lr}
ldr r12, _bifilter8x4_coeff_
adr r12, bifilter8x4_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -128,12 +128,7 @@ skip_secondpass_filter
ENDP
;-----------------
AREA bifilters8x4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter8x4_coeff_
DCD bifilter8x4_coeff
bifilter8x4_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -25,7 +25,7 @@
|vp8_bilinear_predict8x8_neon| PROC
push {r4, lr}
ldr r12, _bifilter8_coeff_
adr r12, bifilter8_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -176,12 +176,7 @@ skip_secondpass_filter
ENDP
;-----------------
AREA bifilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_bifilter8_coeff_
DCD bifilter8_coeff
bifilter8_coeff
DCD 128, 0, 112, 16, 96, 32, 80, 48, 64, 64, 48, 80, 32, 96, 16, 112

View File

@@ -20,19 +20,16 @@
|vp8_short_inv_walsh4x4_neon| PROC
; read in all four lines of values: d0->d3
vldm.64 r0, {q0, q1}
vld1.i16 {q0-q1}, [r0@128]
; first for loop
vadd.s16 d4, d0, d3 ;a = [0] + [12]
vadd.s16 d5, d1, d2 ;b = [4] + [8]
vsub.s16 d6, d1, d2 ;c = [4] - [8]
vsub.s16 d7, d0, d3 ;d = [0] - [12]
vadd.s16 d6, d1, d2 ;b = [4] + [8]
vsub.s16 d5, d0, d3 ;d = [0] - [12]
vsub.s16 d7, d1, d2 ;c = [4] - [8]
vadd.s16 d0, d4, d5 ;a + b
vadd.s16 d1, d6, d7 ;c + d
vsub.s16 d2, d4, d5 ;a - b
vsub.s16 d3, d7, d6 ;d - c
vadd.s16 q0, q2, q3 ; a+b d+c
vsub.s16 q1, q2, q3 ; a-b d-c
vtrn.32 d0, d2 ;d0: 0 1 8 9
;d2: 2 3 10 11
@@ -47,29 +44,22 @@
; second for loop
vadd.s16 d4, d0, d3 ;a = [0] + [3]
vadd.s16 d5, d1, d2 ;b = [1] + [2]
vsub.s16 d6, d1, d2 ;c = [1] - [2]
vsub.s16 d7, d0, d3 ;d = [0] - [3]
vadd.s16 d6, d1, d2 ;b = [1] + [2]
vsub.s16 d5, d0, d3 ;d = [0] - [3]
vsub.s16 d7, d1, d2 ;c = [1] - [2]
vadd.s16 d0, d4, d5 ;e = a + b
vadd.s16 d1, d6, d7 ;f = c + d
vsub.s16 d2, d4, d5 ;g = a - b
vsub.s16 d3, d7, d6 ;h = d - c
vmov.i16 q8, #3
vmov.i16 q2, #3
vadd.i16 q0, q0, q2 ;e/f += 3
vadd.i16 q1, q1, q2 ;g/h += 3
vadd.s16 q0, q2, q3 ; a+b d+c
vsub.s16 q1, q2, q3 ; a-b d-c
vadd.i16 q0, q0, q8 ;e/f += 3
vadd.i16 q1, q1, q8 ;g/h += 3
vshr.s16 q0, q0, #3 ;e/f >> 3
vshr.s16 q1, q1, #3 ;g/h >> 3
vtrn.32 d0, d2
vtrn.32 d1, d3
vtrn.16 d0, d1
vtrn.16 d2, d3
vstmia.16 r1!, {q0}
vstmia.16 r1!, {q1}
vst4.i16 {d0,d1,d2,d3}, [r1@128]
bx lr
ENDP ; |vp8_short_inv_walsh4x4_neon|
@@ -77,19 +67,13 @@
;short vp8_short_inv_walsh4x4_1_neon(short *input, short *output)
|vp8_short_inv_walsh4x4_1_neon| PROC
; load a full line into a neon register
vld1.16 {q0}, [r0]
; extract first element and replicate
vdup.16 q1, d0[0]
; add 3 to all values
vmov.i16 q2, #3
vadd.i16 q3, q1, q2
; right shift
vshr.s16 q3, q3, #3
; write it back
vstmia.16 r1!, {q3}
vstmia.16 r1!, {q3}
ldrsh r2, [r0] ; load input[0]
add r3, r2, #3 ; add 3
add r2, r1, #16 ; base for last 8 output
asr r0, r3, #3 ; right shift 3
vdup.16 q0, r0 ; load and duplicate
vst1.16 {q0}, [r1@128] ; write back 8
vst1.16 {q0}, [r2@128] ; write back last 8
bx lr
ENDP ; |vp8_short_inv_walsh4x4_1_neon|

View File

@@ -14,109 +14,97 @@
EXPORT |vp8_loop_filter_vertical_edge_y_neon|
EXPORT |vp8_loop_filter_vertical_edge_uv_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; flimit, limit, and thresh should be positive numbers.
; All 16 elements in these variables are equal.
; void vp8_loop_filter_horizontal_edge_y_neon(unsigned char *src, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; int count)
; r0 unsigned char *src
; r1 int pitch
; r2 const signed char *flimit
; r3 const signed char *limit
; sp const signed char *thresh,
; sp+4 int count (unused)
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_loop_filter_horizontal_edge_y_neon| PROC
stmdb sp!, {lr}
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
vdup.u8 q1, r3 ; duplicate limit
sub r2, r0, r1, lsl #2 ; move src pointer down by 4 lines
ldr r12, [sp, #4] ; load thresh pointer
ldr r3, [sp, #4] ; load thresh
add r12, r2, r1
add r1, r1, r1
vld1.u8 {q3}, [r2], r1 ; p3
vld1.u8 {q4}, [r2], r1 ; p2
vld1.u8 {q5}, [r2], r1 ; p1
vld1.u8 {q6}, [r2], r1 ; p0
vld1.u8 {q7}, [r2], r1 ; q0
vld1.u8 {q8}, [r2], r1 ; q1
vld1.u8 {q9}, [r2], r1 ; q2
vld1.u8 {q10}, [r2] ; q3
vld1.s8 {d4[], d5[]}, [r12] ; thresh
sub r0, r0, r1, lsl #1
vdup.u8 q2, r3 ; duplicate thresh
vld1.u8 {q3}, [r2@128], r1 ; p3
vld1.u8 {q4}, [r12@128], r1 ; p2
vld1.u8 {q5}, [r2@128], r1 ; p1
vld1.u8 {q6}, [r12@128], r1 ; p0
vld1.u8 {q7}, [r2@128], r1 ; q0
vld1.u8 {q8}, [r12@128], r1 ; q1
vld1.u8 {q9}, [r2@128] ; q2
vld1.u8 {q10}, [r12@128] ; q3
sub r2, r2, r1, lsl #1
sub r12, r12, r1, lsl #1
bl vp8_loop_filter_neon
vst1.u8 {q5}, [r0], r1 ; store op1
vst1.u8 {q6}, [r0], r1 ; store op0
vst1.u8 {q7}, [r0], r1 ; store oq0
vst1.u8 {q8}, [r0], r1 ; store oq1
vst1.u8 {q5}, [r2@128], r1 ; store op1
vst1.u8 {q6}, [r12@128], r1 ; store op0
vst1.u8 {q7}, [r2@128], r1 ; store oq0
vst1.u8 {q8}, [r12@128], r1 ; store oq1
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_loop_filter_horizontal_edge_y_neon|
; void vp8_loop_filter_horizontal_edge_uv_neon(unsigned char *u, int pitch
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
; sp+4 unsigned char *v
|vp8_loop_filter_horizontal_edge_uv_neon| PROC
stmdb sp!, {lr}
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
vdup.u8 q1, r3 ; duplicate limit
ldr r12, [sp, #4] ; load thresh
ldr r2, [sp, #8] ; load v ptr
vdup.u8 q2, r12 ; duplicate thresh
sub r3, r0, r1, lsl #2 ; move u pointer down by 4 lines
vld1.u8 {d6}, [r3], r1 ; p3
vld1.u8 {d8}, [r3], r1 ; p2
vld1.u8 {d10}, [r3], r1 ; p1
vld1.u8 {d12}, [r3], r1 ; p0
vld1.u8 {d14}, [r3], r1 ; q0
vld1.u8 {d16}, [r3], r1 ; q1
vld1.u8 {d18}, [r3], r1 ; q2
vld1.u8 {d20}, [r3] ; q3
ldr r3, [sp, #4] ; load thresh pointer
sub r12, r2, r1, lsl #2 ; move v pointer down by 4 lines
vld1.u8 {d7}, [r12], r1 ; p3
vld1.u8 {d9}, [r12], r1 ; p2
vld1.u8 {d11}, [r12], r1 ; p1
vld1.u8 {d13}, [r12], r1 ; p0
vld1.u8 {d15}, [r12], r1 ; q0
vld1.u8 {d17}, [r12], r1 ; q1
vld1.u8 {d19}, [r12], r1 ; q2
vld1.u8 {d21}, [r12] ; q3
vld1.s8 {d4[], d5[]}, [r3] ; thresh
vld1.u8 {d6}, [r3@64], r1 ; p3
vld1.u8 {d7}, [r12@64], r1 ; p3
vld1.u8 {d8}, [r3@64], r1 ; p2
vld1.u8 {d9}, [r12@64], r1 ; p2
vld1.u8 {d10}, [r3@64], r1 ; p1
vld1.u8 {d11}, [r12@64], r1 ; p1
vld1.u8 {d12}, [r3@64], r1 ; p0
vld1.u8 {d13}, [r12@64], r1 ; p0
vld1.u8 {d14}, [r3@64], r1 ; q0
vld1.u8 {d15}, [r12@64], r1 ; q0
vld1.u8 {d16}, [r3@64], r1 ; q1
vld1.u8 {d17}, [r12@64], r1 ; q1
vld1.u8 {d18}, [r3@64], r1 ; q2
vld1.u8 {d19}, [r12@64], r1 ; q2
vld1.u8 {d20}, [r3@64] ; q3
vld1.u8 {d21}, [r12@64] ; q3
bl vp8_loop_filter_neon
sub r0, r0, r1, lsl #1
sub r2, r2, r1, lsl #1
vst1.u8 {d10}, [r0], r1 ; store u op1
vst1.u8 {d11}, [r2], r1 ; store v op1
vst1.u8 {d12}, [r0], r1 ; store u op0
vst1.u8 {d13}, [r2], r1 ; store v op0
vst1.u8 {d14}, [r0], r1 ; store u oq0
vst1.u8 {d15}, [r2], r1 ; store v oq0
vst1.u8 {d16}, [r0] ; store u oq1
vst1.u8 {d17}, [r2] ; store v oq1
vst1.u8 {d10}, [r0@64], r1 ; store u op1
vst1.u8 {d11}, [r2@64], r1 ; store v op1
vst1.u8 {d12}, [r0@64], r1 ; store u op0
vst1.u8 {d13}, [r2@64], r1 ; store v op0
vst1.u8 {d14}, [r0@64], r1 ; store u oq0
vst1.u8 {d15}, [r2@64], r1 ; store v oq0
vst1.u8 {d16}, [r0@64] ; store u oq1
vst1.u8 {d17}, [r2@64] ; store v oq1
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_loop_filter_horizontal_edge_uv_neon|
; void vp8_loop_filter_vertical_edge_y_neon(unsigned char *src, int pitch,
@@ -124,39 +112,38 @@
; const signed char *limit,
; const signed char *thresh,
; int count)
; r0 unsigned char *src,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 int count (unused)
; r0 unsigned char *src
; r1 int pitch
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_loop_filter_vertical_edge_y_neon| PROC
stmdb sp!, {lr}
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
vdup.u8 q1, r3 ; duplicate limit
sub r2, r0, #4 ; src ptr down by 4 columns
sub r0, r0, #2 ; dst ptr
ldr r12, [sp, #4] ; load thresh pointer
add r1, r1, r1
ldr r3, [sp, #4] ; load thresh
add r12, r2, r1, asr #1
vld1.u8 {d6}, [r2], r1 ; load first 8-line src data
vld1.u8 {d8}, [r2], r1
vld1.u8 {d6}, [r2], r1
vld1.u8 {d8}, [r12], r1
vld1.u8 {d10}, [r2], r1
vld1.u8 {d12}, [r2], r1
vld1.u8 {d12}, [r12], r1
vld1.u8 {d14}, [r2], r1
vld1.u8 {d16}, [r2], r1
vld1.u8 {d16}, [r12], r1
vld1.u8 {d18}, [r2], r1
vld1.u8 {d20}, [r2], r1
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.u8 {d20}, [r12], r1
vld1.u8 {d7}, [r2], r1 ; load second 8-line src data
vld1.u8 {d9}, [r2], r1
vld1.u8 {d9}, [r12], r1
vld1.u8 {d11}, [r2], r1
vld1.u8 {d13}, [r2], r1
vld1.u8 {d13}, [r12], r1
vld1.u8 {d15}, [r2], r1
vld1.u8 {d17}, [r2], r1
vld1.u8 {d19}, [r2], r1
vld1.u8 {d21}, [r2]
vld1.u8 {d17}, [r12], r1
vld1.u8 {d19}, [r2]
vld1.u8 {d21}, [r12]
;transpose to 8x16 matrix
vtrn.32 q3, q7
@@ -164,6 +151,8 @@
vtrn.32 q5, q9
vtrn.32 q6, q10
vdup.u8 q2, r3 ; duplicate thresh
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
@@ -178,28 +167,34 @@
vswp d12, d11
vswp d16, d13
sub r0, r0, #2 ; dst ptr
vswp d14, d12
vswp d16, d15
add r12, r0, r1, asr #1
;store op1, op0, oq0, oq1
vst4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r1
vst4.8 {d10[1], d11[1], d12[1], d13[1]}, [r0], r1
vst4.8 {d10[1], d11[1], d12[1], d13[1]}, [r12], r1
vst4.8 {d10[2], d11[2], d12[2], d13[2]}, [r0], r1
vst4.8 {d10[3], d11[3], d12[3], d13[3]}, [r0], r1
vst4.8 {d10[3], d11[3], d12[3], d13[3]}, [r12], r1
vst4.8 {d10[4], d11[4], d12[4], d13[4]}, [r0], r1
vst4.8 {d10[5], d11[5], d12[5], d13[5]}, [r0], r1
vst4.8 {d10[5], d11[5], d12[5], d13[5]}, [r12], r1
vst4.8 {d10[6], d11[6], d12[6], d13[6]}, [r0], r1
vst4.8 {d10[7], d11[7], d12[7], d13[7]}, [r0], r1
vst4.8 {d14[0], d15[0], d16[0], d17[0]}, [r0], r1
vst4.8 {d14[1], d15[1], d16[1], d17[1]}, [r0], r1
vst4.8 {d14[2], d15[2], d16[2], d17[2]}, [r0], r1
vst4.8 {d14[3], d15[3], d16[3], d17[3]}, [r0], r1
vst4.8 {d14[4], d15[4], d16[4], d17[4]}, [r0], r1
vst4.8 {d14[5], d15[5], d16[5], d17[5]}, [r0], r1
vst4.8 {d14[6], d15[6], d16[6], d17[6]}, [r0], r1
vst4.8 {d14[7], d15[7], d16[7], d17[7]}, [r0]
vst4.8 {d10[7], d11[7], d12[7], d13[7]}, [r12], r1
ldmia sp!, {pc}
vst4.8 {d14[0], d15[0], d16[0], d17[0]}, [r0], r1
vst4.8 {d14[1], d15[1], d16[1], d17[1]}, [r12], r1
vst4.8 {d14[2], d15[2], d16[2], d17[2]}, [r0], r1
vst4.8 {d14[3], d15[3], d16[3], d17[3]}, [r12], r1
vst4.8 {d14[4], d15[4], d16[4], d17[4]}, [r0], r1
vst4.8 {d14[5], d15[5], d16[5], d17[5]}, [r12], r1
vst4.8 {d14[6], d15[6], d16[6], d17[6]}, [r0]
vst4.8 {d14[7], d15[7], d16[7], d17[7]}, [r12]
pop {pc}
ENDP ; |vp8_loop_filter_vertical_edge_y_neon|
; void vp8_loop_filter_vertical_edge_uv_neon(unsigned char *u, int pitch
@@ -209,38 +204,36 @@
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
; sp+4 unsigned char *v
|vp8_loop_filter_vertical_edge_uv_neon| PROC
stmdb sp!, {lr}
push {lr}
vdup.u8 q0, r2 ; duplicate blimit
sub r12, r0, #4 ; move u pointer down by 4 columns
vld1.s8 {d0[], d1[]}, [r2] ; flimit
vld1.s8 {d2[], d3[]}, [r3] ; limit
ldr r2, [sp, #8] ; load v ptr
vdup.u8 q1, r3 ; duplicate limit
sub r3, r2, #4 ; move v pointer down by 4 columns
vld1.u8 {d6}, [r12], r1 ;load u data
vld1.u8 {d8}, [r12], r1
vld1.u8 {d10}, [r12], r1
vld1.u8 {d12}, [r12], r1
vld1.u8 {d14}, [r12], r1
vld1.u8 {d16}, [r12], r1
vld1.u8 {d18}, [r12], r1
vld1.u8 {d20}, [r12]
sub r3, r2, #4 ; move v pointer down by 4 columns
vld1.u8 {d7}, [r3], r1 ;load v data
vld1.u8 {d8}, [r12], r1
vld1.u8 {d9}, [r3], r1
vld1.u8 {d10}, [r12], r1
vld1.u8 {d11}, [r3], r1
vld1.u8 {d12}, [r12], r1
vld1.u8 {d13}, [r3], r1
vld1.u8 {d14}, [r12], r1
vld1.u8 {d15}, [r3], r1
vld1.u8 {d16}, [r12], r1
vld1.u8 {d17}, [r3], r1
vld1.u8 {d18}, [r12], r1
vld1.u8 {d19}, [r3], r1
vld1.u8 {d20}, [r12]
vld1.u8 {d21}, [r3]
ldr r12, [sp, #4] ; load thresh pointer
ldr r12, [sp, #4] ; load thresh
;transpose to 8x16 matrix
vtrn.32 q3, q7
@@ -248,6 +241,8 @@
vtrn.32 q5, q9
vtrn.32 q6, q10
vdup.u8 q2, r12 ; duplicate thresh
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
@@ -258,18 +253,16 @@
vtrn.8 q7, q8
vtrn.8 q9, q10
vld1.s8 {d4[], d5[]}, [r12] ; thresh
bl vp8_loop_filter_neon
sub r0, r0, #2
sub r2, r2, #2
vswp d12, d11
vswp d16, d13
vswp d14, d12
vswp d16, d15
sub r0, r0, #2
sub r2, r2, #2
;store op1, op0, oq0, oq1
vst4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r1
vst4.8 {d14[0], d15[0], d16[0], d17[0]}, [r2], r1
@@ -288,7 +281,7 @@
vst4.8 {d10[7], d11[7], d12[7], d13[7]}, [r0]
vst4.8 {d14[7], d15[7], d16[7], d17[7]}, [r2]
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_loop_filter_vertical_edge_uv_neon|
; void vp8_loop_filter_neon();
@@ -308,7 +301,6 @@
; q9 q2
; q10 q3
|vp8_loop_filter_neon| PROC
ldr r12, _lf_coeff_
; vp8_filter_mask
vabd.u8 q11, q3, q4 ; abs(p3 - p2)
@@ -317,42 +309,44 @@
vabd.u8 q14, q8, q7 ; abs(q1 - q0)
vabd.u8 q3, q9, q8 ; abs(q2 - q1)
vabd.u8 q4, q10, q9 ; abs(q3 - q2)
vabd.u8 q9, q6, q7 ; abs(p0 - q0)
vmax.u8 q11, q11, q12
vmax.u8 q12, q13, q14
vmax.u8 q3, q3, q4
vmax.u8 q15, q11, q12
vabd.u8 q9, q6, q7 ; abs(p0 - q0)
; vp8_hevmask
vcgt.u8 q13, q13, q2 ; (abs(p1 - p0) > thresh)*-1
vcgt.u8 q14, q14, q2 ; (abs(q1 - q0) > thresh)*-1
vmax.u8 q15, q15, q3
vadd.u8 q0, q0, q0 ; flimit * 2
vadd.u8 q0, q0, q1 ; flimit * 2 + limit
vcge.u8 q15, q1, q15
vmov.u8 q10, #0x80 ; 0x80
vabd.u8 q2, q5, q8 ; a = abs(p1 - q1)
vqadd.u8 q9, q9, q9 ; b = abs(p0 - q0) * 2
vshr.u8 q2, q2, #1 ; a = a / 2
vqadd.u8 q9, q9, q2 ; a = b + a
vcge.u8 q9, q0, q9 ; (a > flimit * 2 + limit) * -1
vld1.u8 {q0}, [r12]!
vcge.u8 q15, q1, q15
; vp8_filter() function
; convert to signed
veor q7, q7, q0 ; qs0
veor q6, q6, q0 ; ps0
veor q5, q5, q0 ; ps1
veor q8, q8, q0 ; qs1
veor q7, q7, q10 ; qs0
vshr.u8 q2, q2, #1 ; a = a / 2
veor q6, q6, q10 ; ps0
vld1.u8 {q10}, [r12]!
veor q5, q5, q10 ; ps1
vqadd.u8 q9, q9, q2 ; a = b + a
veor q8, q8, q10 ; qs1
vmov.u8 q10, #3 ; #3
vsubl.s8 q2, d14, d12 ; ( qs0 - ps0)
vsubl.s8 q11, d15, d13
vcge.u8 q9, q0, q9 ; (a > flimit * 2 + limit) * -1
vmovl.u8 q4, d20
vqsub.s8 q1, q5, q8 ; vp8_filter = clamp(ps1-qs1)
@@ -367,7 +361,7 @@
vaddw.s8 q2, q2, d2
vaddw.s8 q11, q11, d3
vld1.u8 {q9}, [r12]!
vmov.u8 q9, #4 ; #4
; vp8_filter = clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d2, q2
@@ -379,31 +373,25 @@
vshr.s8 q2, q2, #3 ; Filter2 >>= 3
vshr.s8 q1, q1, #3 ; Filter1 >>= 3
vqadd.s8 q11, q6, q2 ; u = clamp(ps0 + Filter2)
vqsub.s8 q10, q7, q1 ; u = clamp(qs0 - Filter1)
; outer tap adjustments: ++vp8_filter >> 1
vrshr.s8 q1, q1, #1
vbic q1, q1, q14 ; vp8_filter &= ~hev
vmov.u8 q0, #0x80 ; 0x80
vqadd.s8 q13, q5, q1 ; u = clamp(ps1 + vp8_filter)
vqsub.s8 q12, q8, q1 ; u = clamp(qs1 - vp8_filter)
veor q5, q13, q0 ; *op1 = u^0x80
veor q6, q11, q0 ; *op0 = u^0x80
veor q7, q10, q0 ; *oq0 = u^0x80
veor q5, q13, q0 ; *op1 = u^0x80
veor q8, q12, q0 ; *oq1 = u^0x80
bx lr
ENDP ; |vp8_loop_filter_horizontal_edge_y_neon|
AREA loopfilter_dat, DATA, READONLY
_lf_coeff_
DCD lf_coeff
lf_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
DCD 0x01010101, 0x01010101, 0x01010101, 0x01010101
;-----------------
END

View File

@@ -9,110 +9,109 @@
;
EXPORT |vp8_loop_filter_simple_horizontal_edge_neon|
;EXPORT |vp8_loop_filter_simple_horizontal_edge_neon|
EXPORT |vp8_loop_filter_bhs_neon|
EXPORT |vp8_loop_filter_mbhs_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;Note: flimit, limit, and thresh shpuld be positive numbers. All 16 elements in flimit
;are equal. So, in the code, only one load is needed
;for flimit. Same way applies to limit and thresh.
; r0 unsigned char *s,
; r1 int p, //pitch
; r2 const signed char *flimit,
; r3 const signed char *limit,
; stack(r4) const signed char *thresh,
; //stack(r5) int count --unused
; r0 unsigned char *s, PRESERVE
; r1 int p, PRESERVE
; q1 limit, PRESERVE
|vp8_loop_filter_simple_horizontal_edge_neon| PROC
sub r0, r0, r1, lsl #1 ; move src pointer down by 2 lines
ldr r12, _lfhy_coeff_
vld1.u8 {q5}, [r0], r1 ; p1
vld1.s8 {d2[], d3[]}, [r2] ; flimit
vld1.s8 {d26[], d27[]}, [r3] ; limit -> q13
vld1.u8 {q6}, [r0], r1 ; p0
vld1.u8 {q0}, [r12]! ; 0x80
vld1.u8 {q7}, [r0], r1 ; q0
vld1.u8 {q10}, [r12]! ; 0x03
vld1.u8 {q8}, [r0] ; q1
sub r3, r0, r1, lsl #1 ; move src pointer down by 2 lines
vld1.u8 {q7}, [r0@128], r1 ; q0
vld1.u8 {q5}, [r3@128], r1 ; p0
vld1.u8 {q8}, [r0@128] ; q1
vld1.u8 {q6}, [r3@128] ; p1
;vp8_filter_mask() function
vabd.u8 q15, q6, q7 ; abs(p0 - q0)
vabd.u8 q14, q5, q8 ; abs(p1 - q1)
vqadd.u8 q15, q15, q15 ; abs(p0 - q0) * 2
vshr.u8 q14, q14, #1 ; abs(p1 - q1) / 2
vmov.u8 q0, #0x80 ; 0x80
vmov.s16 q13, #3
vqadd.u8 q15, q15, q14 ; abs(p0 - q0) * 2 + abs(p1 - q1) / 2
;vp8_filter() function
veor q7, q7, q0 ; qs0: q0 offset to convert to a signed value
veor q6, q6, q0 ; ps0: p0 offset to convert to a signed value
veor q5, q5, q0 ; ps1: p1 offset to convert to a signed value
veor q8, q8, q0 ; qs1: q1 offset to convert to a signed value
vadd.u8 q1, q1, q1 ; flimit * 2
vadd.u8 q1, q1, q13 ; flimit * 2 + limit
vcge.u8 q15, q1, q15 ; (abs(p0 - q0)*2 + abs(p1-q1)/2 > flimit*2 + limit)*-1
vcge.u8 q15, q1, q15 ; (abs(p0 - q0)*2 + abs(p1-q1)/2 > limit)*-1
;;;;;;;;;;
;vqsub.s8 q2, q7, q6 ; ( qs0 - ps0)
vsubl.s8 q2, d14, d12 ; ( qs0 - ps0)
vsubl.s8 q3, d15, d13
vqsub.s8 q4, q5, q8 ; q4: vp8_filter = vp8_signed_char_clamp(ps1-qs1)
;vmul.i8 q2, q2, q10 ; 3 * ( qs0 - ps0)
vadd.s16 q11, q2, q2 ; 3 * ( qs0 - ps0)
vadd.s16 q12, q3, q3
vmul.s16 q2, q2, q13 ; 3 * ( qs0 - ps0)
vmul.s16 q3, q3, q13
vld1.u8 {q9}, [r12]! ; 0x04
vadd.s16 q2, q2, q11
vadd.s16 q3, q3, q12
vmov.u8 q10, #0x03 ; 0x03
vmov.u8 q9, #0x04 ; 0x04
vaddw.s8 q2, q2, d8 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q3, q3, d9
;vqadd.s8 q4, q4, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d8, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d9, q3
;;;;;;;;;;;;;
vand q4, q4, q15 ; vp8_filter &= mask
vand q14, q4, q15 ; vp8_filter &= mask
vqadd.s8 q2, q4, q10 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q4, q4, q9 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vqadd.s8 q2, q14, q10 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q3, q14, q9 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vshr.s8 q2, q2, #3 ; Filter2 >>= 3
vshr.s8 q4, q4, #3 ; Filter1 >>= 3
vshr.s8 q4, q3, #3 ; Filter1 >>= 3
sub r0, r0, r1, lsl #1
sub r0, r0, r1
;calculate output
vqadd.s8 q11, q6, q2 ; u = vp8_signed_char_clamp(ps0 + Filter2)
vqsub.s8 q10, q7, q4 ; u = vp8_signed_char_clamp(qs0 - Filter1)
add r3, r0, r1
veor q6, q11, q0 ; *op0 = u^0x80
veor q7, q10, q0 ; *oq0 = u^0x80
vst1.u8 {q6}, [r0] ; store op0
vst1.u8 {q7}, [r3] ; store oq0
vst1.u8 {q6}, [r3@128] ; store op0
vst1.u8 {q7}, [r0@128] ; store oq0
bx lr
ENDP ; |vp8_loop_filter_simple_horizontal_edge_neon|
;-----------------
AREA hloopfiltery_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 16 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_lfhy_coeff_
DCD lfhy_coeff
lfhy_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
|vp8_loop_filter_bhs_neon| PROC
push {r4, lr}
ldrb r3, [r2] ; load blim from mem
vdup.s8 q1, r3 ; duplicate blim
add r0, r0, r1, lsl #2 ; src = y_ptr + 4 * y_stride
bl vp8_loop_filter_simple_horizontal_edge_neon
; vp8_loop_filter_simple_horizontal_edge_neon preserves r0, r1 and q1
add r0, r0, r1, lsl #2 ; src = y_ptr + 8* y_stride
bl vp8_loop_filter_simple_horizontal_edge_neon
add r0, r0, r1, lsl #2 ; src = y_ptr + 12 * y_stride
pop {r4, lr}
b vp8_loop_filter_simple_horizontal_edge_neon
ENDP ;|vp8_loop_filter_bhs_neon|
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
|vp8_loop_filter_mbhs_neon| PROC
ldrb r3, [r2] ; load blim from mem
vdup.s8 q1, r3 ; duplicate mblim
b vp8_loop_filter_simple_horizontal_edge_neon
ENDP ;|vp8_loop_filter_bhs_neon|
END

View File

@@ -9,60 +9,54 @@
;
EXPORT |vp8_loop_filter_simple_vertical_edge_neon|
;EXPORT |vp8_loop_filter_simple_vertical_edge_neon|
EXPORT |vp8_loop_filter_bvs_neon|
EXPORT |vp8_loop_filter_mbvs_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
;Note: flimit, limit, and thresh should be positive numbers. All 16 elements in flimit
;are equal. So, in the code, only one load is needed
;for flimit. Same way applies to limit and thresh.
; r0 unsigned char *s,
; r1 int p, //pitch
; r2 const signed char *flimit,
; r3 const signed char *limit,
; stack(r4) const signed char *thresh,
; //stack(r5) int count --unused
; r0 unsigned char *s, PRESERVE
; r1 int p, PRESERVE
; q1 limit, PRESERVE
|vp8_loop_filter_simple_vertical_edge_neon| PROC
sub r0, r0, #2 ; move src pointer down by 2 columns
add r12, r1, r1
add r3, r0, r1
vld4.8 {d6[0], d7[0], d8[0], d9[0]}, [r0], r1
vld1.s8 {d2[], d3[]}, [r2] ; flimit
vld1.s8 {d26[], d27[]}, [r3] ; limit -> q13
vld4.8 {d6[1], d7[1], d8[1], d9[1]}, [r0], r1
ldr r12, _vlfy_coeff_
vld4.8 {d6[2], d7[2], d8[2], d9[2]}, [r0], r1
vld4.8 {d6[3], d7[3], d8[3], d9[3]}, [r0], r1
vld4.8 {d6[4], d7[4], d8[4], d9[4]}, [r0], r1
vld4.8 {d6[5], d7[5], d8[5], d9[5]}, [r0], r1
vld4.8 {d6[6], d7[6], d8[6], d9[6]}, [r0], r1
vld4.8 {d6[7], d7[7], d8[7], d9[7]}, [r0], r1
vld4.8 {d6[0], d7[0], d8[0], d9[0]}, [r0], r12
vld4.8 {d6[1], d7[1], d8[1], d9[1]}, [r3], r12
vld4.8 {d6[2], d7[2], d8[2], d9[2]}, [r0], r12
vld4.8 {d6[3], d7[3], d8[3], d9[3]}, [r3], r12
vld4.8 {d6[4], d7[4], d8[4], d9[4]}, [r0], r12
vld4.8 {d6[5], d7[5], d8[5], d9[5]}, [r3], r12
vld4.8 {d6[6], d7[6], d8[6], d9[6]}, [r0], r12
vld4.8 {d6[7], d7[7], d8[7], d9[7]}, [r3], r12
vld4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r1
vld1.u8 {q0}, [r12]! ; 0x80
vld4.8 {d10[1], d11[1], d12[1], d13[1]}, [r0], r1
vld1.u8 {q11}, [r12]! ; 0x03
vld4.8 {d10[2], d11[2], d12[2], d13[2]}, [r0], r1
vld1.u8 {q12}, [r12]! ; 0x04
vld4.8 {d10[3], d11[3], d12[3], d13[3]}, [r0], r1
vld4.8 {d10[4], d11[4], d12[4], d13[4]}, [r0], r1
vld4.8 {d10[5], d11[5], d12[5], d13[5]}, [r0], r1
vld4.8 {d10[6], d11[6], d12[6], d13[6]}, [r0], r1
vld4.8 {d10[7], d11[7], d12[7], d13[7]}, [r0], r1
vld4.8 {d10[0], d11[0], d12[0], d13[0]}, [r0], r12
vld4.8 {d10[1], d11[1], d12[1], d13[1]}, [r3], r12
vld4.8 {d10[2], d11[2], d12[2], d13[2]}, [r0], r12
vld4.8 {d10[3], d11[3], d12[3], d13[3]}, [r3], r12
vld4.8 {d10[4], d11[4], d12[4], d13[4]}, [r0], r12
vld4.8 {d10[5], d11[5], d12[5], d13[5]}, [r3], r12
vld4.8 {d10[6], d11[6], d12[6], d13[6]}, [r0], r12
vld4.8 {d10[7], d11[7], d12[7], d13[7]}, [r3]
vswp d7, d10
vswp d12, d9
;vswp q4, q5 ; p1:q3, p0:q5, q0:q4, q1:q6
;vp8_filter_mask() function
;vp8_hevmask() function
sub r0, r0, r1, lsl #4
vabd.u8 q15, q5, q4 ; abs(p0 - q0)
vabd.u8 q14, q3, q6 ; abs(p1 - q1)
vqadd.u8 q15, q15, q15 ; abs(p0 - q0) * 2
vshr.u8 q14, q14, #1 ; abs(p1 - q1) / 2
vmov.u8 q0, #0x80 ; 0x80
vmov.s16 q11, #3
vqadd.u8 q15, q15, q14 ; abs(p0 - q0) * 2 + abs(p1 - q1) / 2
veor q4, q4, q0 ; qs0: q0 offset to convert to a signed value
@@ -70,90 +64,91 @@
veor q3, q3, q0 ; ps1: p1 offset to convert to a signed value
veor q6, q6, q0 ; qs1: q1 offset to convert to a signed value
vadd.u8 q1, q1, q1 ; flimit * 2
vadd.u8 q1, q1, q13 ; flimit * 2 + limit
vcge.u8 q15, q1, q15 ; abs(p0 - q0)*2 + abs(p1-q1)/2 > flimit*2 + limit)*-1
;vp8_filter() function
;;;;;;;;;;
;vqsub.s8 q2, q5, q4 ; ( qs0 - ps0)
vsubl.s8 q2, d8, d10 ; ( qs0 - ps0)
vsubl.s8 q13, d9, d11
vqsub.s8 q1, q3, q6 ; vp8_filter = vp8_signed_char_clamp(ps1-qs1)
vqsub.s8 q14, q3, q6 ; vp8_filter = vp8_signed_char_clamp(ps1-qs1)
;vmul.i8 q2, q2, q11 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vadd.s16 q10, q2, q2 ; 3 * ( qs0 - ps0)
vadd.s16 q14, q13, q13
vadd.s16 q2, q2, q10
vadd.s16 q13, q13, q14
vmul.s16 q2, q2, q11 ; 3 * ( qs0 - ps0)
vmul.s16 q13, q13, q11
;vqadd.s8 q1, q1, q2
vaddw.s8 q2, q2, d2 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q13, q13, d3
vmov.u8 q11, #0x03 ; 0x03
vmov.u8 q12, #0x04 ; 0x04
vqmovn.s16 d2, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d3, q13
vaddw.s8 q2, q2, d28 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q13, q13, d29
vqmovn.s16 d28, q2 ; vp8_filter = vp8_signed_char_clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d29, q13
add r0, r0, #1
add r2, r0, r1
;;;;;;;;;;;
add r3, r0, r1
vand q1, q1, q15 ; vp8_filter &= mask
vand q14, q14, q15 ; vp8_filter &= mask
vqadd.s8 q2, q1, q11 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q1, q1, q12 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vqadd.s8 q2, q14, q11 ; Filter2 = vp8_signed_char_clamp(vp8_filter+3)
vqadd.s8 q3, q14, q12 ; Filter1 = vp8_signed_char_clamp(vp8_filter+4)
vshr.s8 q2, q2, #3 ; Filter2 >>= 3
vshr.s8 q1, q1, #3 ; Filter1 >>= 3
vshr.s8 q14, q3, #3 ; Filter1 >>= 3
;calculate output
vqsub.s8 q10, q4, q1 ; u = vp8_signed_char_clamp(qs0 - Filter1)
vqadd.s8 q11, q5, q2 ; u = vp8_signed_char_clamp(ps0 + Filter2)
vqsub.s8 q10, q4, q14 ; u = vp8_signed_char_clamp(qs0 - Filter1)
veor q7, q10, q0 ; *oq0 = u^0x80
veor q6, q11, q0 ; *op0 = u^0x80
add r3, r2, r1
veor q7, q10, q0 ; *oq0 = u^0x80
add r12, r1, r1
vswp d13, d14
add r12, r3, r1
;store op1, op0, oq0, oq1
vst2.8 {d12[0], d13[0]}, [r0]
vst2.8 {d12[1], d13[1]}, [r2]
vst2.8 {d12[2], d13[2]}, [r3]
vst2.8 {d12[3], d13[3]}, [r12], r1
add r0, r12, r1
vst2.8 {d12[4], d13[4]}, [r12]
vst2.8 {d12[5], d13[5]}, [r0], r1
add r2, r0, r1
vst2.8 {d12[6], d13[6]}, [r0]
vst2.8 {d12[7], d13[7]}, [r2], r1
add r3, r2, r1
vst2.8 {d14[0], d15[0]}, [r2]
vst2.8 {d14[1], d15[1]}, [r3], r1
add r12, r3, r1
vst2.8 {d14[2], d15[2]}, [r3]
vst2.8 {d14[3], d15[3]}, [r12], r1
add r0, r12, r1
vst2.8 {d14[4], d15[4]}, [r12]
vst2.8 {d14[5], d15[5]}, [r0], r1
add r2, r0, r1
vst2.8 {d14[6], d15[6]}, [r0]
vst2.8 {d14[7], d15[7]}, [r2]
vst2.8 {d12[0], d13[0]}, [r0], r12
vst2.8 {d12[1], d13[1]}, [r3], r12
vst2.8 {d12[2], d13[2]}, [r0], r12
vst2.8 {d12[3], d13[3]}, [r3], r12
vst2.8 {d12[4], d13[4]}, [r0], r12
vst2.8 {d12[5], d13[5]}, [r3], r12
vst2.8 {d12[6], d13[6]}, [r0], r12
vst2.8 {d12[7], d13[7]}, [r3], r12
vst2.8 {d14[0], d15[0]}, [r0], r12
vst2.8 {d14[1], d15[1]}, [r3], r12
vst2.8 {d14[2], d15[2]}, [r0], r12
vst2.8 {d14[3], d15[3]}, [r3], r12
vst2.8 {d14[4], d15[4]}, [r0], r12
vst2.8 {d14[5], d15[5]}, [r3], r12
vst2.8 {d14[6], d15[6]}, [r0], r12
vst2.8 {d14[7], d15[7]}, [r3]
bx lr
ENDP ; |vp8_loop_filter_simple_vertical_edge_neon|
;-----------------
AREA vloopfiltery_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 16 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_vlfy_coeff_
DCD vlfy_coeff
vlfy_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
|vp8_loop_filter_bvs_neon| PROC
push {r4, lr}
ldrb r3, [r2] ; load blim from mem
mov r4, r0
add r0, r0, #4
vdup.s8 q1, r3 ; duplicate blim
bl vp8_loop_filter_simple_vertical_edge_neon
; vp8_loop_filter_simple_vertical_edge_neon preserves r1 and q1
add r0, r4, #8
bl vp8_loop_filter_simple_vertical_edge_neon
add r0, r4, #12
pop {r4, lr}
b vp8_loop_filter_simple_vertical_edge_neon
ENDP ;|vp8_loop_filter_bvs_neon|
; r0 unsigned char *y
; r1 int ystride
; r2 const unsigned char *blimit
|vp8_loop_filter_mbvs_neon| PROC
ldrb r3, [r2] ; load mblim from mem
vdup.s8 q1, r3 ; duplicate mblim
b vp8_loop_filter_simple_vertical_edge_neon
ENDP ;|vp8_loop_filter_bvs_neon|
END

View File

@@ -14,155 +14,143 @@
EXPORT |vp8_mbloop_filter_vertical_edge_y_neon|
EXPORT |vp8_mbloop_filter_vertical_edge_uv_neon|
ARM
REQUIRE8
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
; flimit, limit, and thresh should be positive numbers.
; All 16 elements in these variables are equal.
; void vp8_mbloop_filter_horizontal_edge_y_neon(unsigned char *src, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; int count)
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh)
; r0 unsigned char *src,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 int count (unused)
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_mbloop_filter_horizontal_edge_y_neon| PROC
stmdb sp!, {lr}
sub r0, r0, r1, lsl #2 ; move src pointer down by 4 lines
ldr r12, [sp, #4] ; load thresh pointer
push {lr}
add r1, r1, r1 ; double stride
ldr r12, [sp, #4] ; load thresh
sub r0, r0, r1, lsl #1 ; move src pointer down by 4 lines
vdup.u8 q2, r12 ; thresh
add r12, r0, r1, lsr #1 ; move src pointer up by 1 line
vld1.u8 {q3}, [r0], r1 ; p3
vld1.s8 {d2[], d3[]}, [r3] ; limit
vld1.u8 {q4}, [r0], r1 ; p2
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.u8 {q5}, [r0], r1 ; p1
vld1.u8 {q6}, [r0], r1 ; p0
vld1.u8 {q7}, [r0], r1 ; q0
vld1.u8 {q8}, [r0], r1 ; q1
vld1.u8 {q9}, [r0], r1 ; q2
vld1.u8 {q10}, [r0], r1 ; q3
vld1.u8 {q3}, [r0@128], r1 ; p3
vld1.u8 {q4}, [r12@128], r1 ; p2
vld1.u8 {q5}, [r0@128], r1 ; p1
vld1.u8 {q6}, [r12@128], r1 ; p0
vld1.u8 {q7}, [r0@128], r1 ; q0
vld1.u8 {q8}, [r12@128], r1 ; q1
vld1.u8 {q9}, [r0@128], r1 ; q2
vld1.u8 {q10}, [r12@128], r1 ; q3
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #3
add r0, r0, r1
add r2, r0, r1
add r3, r2, r1
sub r12, r12, r1, lsl #2
add r0, r12, r1, lsr #1
vst1.u8 {q4}, [r0] ; store op2
vst1.u8 {q5}, [r2] ; store op1
vst1.u8 {q6}, [r3], r1 ; store op0
add r12, r3, r1
vst1.u8 {q7}, [r3] ; store oq0
vst1.u8 {q8}, [r12], r1 ; store oq1
vst1.u8 {q9}, [r12] ; store oq2
vst1.u8 {q4}, [r12@128],r1 ; store op2
vst1.u8 {q5}, [r0@128],r1 ; store op1
vst1.u8 {q6}, [r12@128], r1 ; store op0
vst1.u8 {q7}, [r0@128],r1 ; store oq0
vst1.u8 {q8}, [r12@128] ; store oq1
vst1.u8 {q9}, [r0@128] ; store oq2
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_mbloop_filter_horizontal_edge_y_neon|
; void vp8_mbloop_filter_horizontal_edge_uv_neon(unsigned char *u, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
; sp+4 unsigned char *v
|vp8_mbloop_filter_horizontal_edge_uv_neon| PROC
stmdb sp!, {lr}
push {lr}
ldr r12, [sp, #4] ; load thresh
sub r0, r0, r1, lsl #2 ; move u pointer down by 4 lines
vld1.s8 {d2[], d3[]}, [r3] ; limit
ldr r3, [sp, #8] ; load v ptr
ldr r12, [sp, #4] ; load thresh pointer
sub r3, r3, r1, lsl #2 ; move v pointer down by 4 lines
vdup.u8 q2, r12 ; thresh
ldr r12, [sp, #8] ; load v ptr
sub r12, r12, r1, lsl #2 ; move v pointer down by 4 lines
vld1.u8 {d6}, [r0], r1 ; p3
vld1.u8 {d7}, [r3], r1 ; p3
vld1.u8 {d8}, [r0], r1 ; p2
vld1.u8 {d9}, [r3], r1 ; p2
vld1.u8 {d10}, [r0], r1 ; p1
vld1.u8 {d11}, [r3], r1 ; p1
vld1.u8 {d12}, [r0], r1 ; p0
vld1.u8 {d13}, [r3], r1 ; p0
vld1.u8 {d14}, [r0], r1 ; q0
vld1.u8 {d15}, [r3], r1 ; q0
vld1.u8 {d16}, [r0], r1 ; q1
vld1.u8 {d17}, [r3], r1 ; q1
vld1.u8 {d18}, [r0], r1 ; q2
vld1.u8 {d19}, [r3], r1 ; q2
vld1.u8 {d20}, [r0], r1 ; q3
vld1.u8 {d21}, [r3], r1 ; q3
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.u8 {d6}, [r0@64], r1 ; p3
vld1.u8 {d7}, [r12@64], r1 ; p3
vld1.u8 {d8}, [r0@64], r1 ; p2
vld1.u8 {d9}, [r12@64], r1 ; p2
vld1.u8 {d10}, [r0@64], r1 ; p1
vld1.u8 {d11}, [r12@64], r1 ; p1
vld1.u8 {d12}, [r0@64], r1 ; p0
vld1.u8 {d13}, [r12@64], r1 ; p0
vld1.u8 {d14}, [r0@64], r1 ; q0
vld1.u8 {d15}, [r12@64], r1 ; q0
vld1.u8 {d16}, [r0@64], r1 ; q1
vld1.u8 {d17}, [r12@64], r1 ; q1
vld1.u8 {d18}, [r0@64], r1 ; q2
vld1.u8 {d19}, [r12@64], r1 ; q2
vld1.u8 {d20}, [r0@64], r1 ; q3
vld1.u8 {d21}, [r12@64], r1 ; q3
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #3
sub r3, r3, r1, lsl #3
sub r12, r12, r1, lsl #3
add r0, r0, r1
add r3, r3, r1
add r12, r12, r1
vst1.u8 {d8}, [r0], r1 ; store u op2
vst1.u8 {d9}, [r3], r1 ; store v op2
vst1.u8 {d10}, [r0], r1 ; store u op1
vst1.u8 {d11}, [r3], r1 ; store v op1
vst1.u8 {d12}, [r0], r1 ; store u op0
vst1.u8 {d13}, [r3], r1 ; store v op0
vst1.u8 {d14}, [r0], r1 ; store u oq0
vst1.u8 {d15}, [r3], r1 ; store v oq0
vst1.u8 {d16}, [r0], r1 ; store u oq1
vst1.u8 {d17}, [r3], r1 ; store v oq1
vst1.u8 {d18}, [r0], r1 ; store u oq2
vst1.u8 {d19}, [r3], r1 ; store v oq2
vst1.u8 {d8}, [r0@64], r1 ; store u op2
vst1.u8 {d9}, [r12@64], r1 ; store v op2
vst1.u8 {d10}, [r0@64], r1 ; store u op1
vst1.u8 {d11}, [r12@64], r1 ; store v op1
vst1.u8 {d12}, [r0@64], r1 ; store u op0
vst1.u8 {d13}, [r12@64], r1 ; store v op0
vst1.u8 {d14}, [r0@64], r1 ; store u oq0
vst1.u8 {d15}, [r12@64], r1 ; store v oq0
vst1.u8 {d16}, [r0@64], r1 ; store u oq1
vst1.u8 {d17}, [r12@64], r1 ; store v oq1
vst1.u8 {d18}, [r0@64], r1 ; store u oq2
vst1.u8 {d19}, [r12@64], r1 ; store v oq2
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_mbloop_filter_horizontal_edge_uv_neon|
; void vp8_mbloop_filter_vertical_edge_y_neon(unsigned char *src, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; int count)
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh)
; r0 unsigned char *src,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 int count (unused)
; r2 unsigned char blimit
; r3 unsigned char limit
; sp unsigned char thresh,
|vp8_mbloop_filter_vertical_edge_y_neon| PROC
stmdb sp!, {lr}
push {lr}
ldr r12, [sp, #4] ; load thresh
sub r0, r0, #4 ; move src pointer down by 4 columns
vdup.s8 q2, r12 ; thresh
add r12, r0, r1, lsl #3 ; move src pointer down by 8 lines
vld1.u8 {d6}, [r0], r1 ; load first 8-line src data
ldr r12, [sp, #4] ; load thresh pointer
vld1.u8 {d7}, [r12], r1 ; load second 8-line src data
vld1.u8 {d8}, [r0], r1
sub sp, sp, #32
vld1.u8 {d9}, [r12], r1
vld1.u8 {d10}, [r0], r1
vld1.u8 {d11}, [r12], r1
vld1.u8 {d12}, [r0], r1
vld1.u8 {d13}, [r12], r1
vld1.u8 {d14}, [r0], r1
vld1.u8 {d15}, [r12], r1
vld1.u8 {d16}, [r0], r1
vld1.u8 {d17}, [r12], r1
vld1.u8 {d18}, [r0], r1
vld1.u8 {d19}, [r12], r1
vld1.u8 {d20}, [r0], r1
vld1.u8 {d7}, [r0], r1 ; load second 8-line src data
vld1.u8 {d9}, [r0], r1
vld1.u8 {d11}, [r0], r1
vld1.u8 {d13}, [r0], r1
vld1.u8 {d15}, [r0], r1
vld1.u8 {d17}, [r0], r1
vld1.u8 {d19}, [r0], r1
vld1.u8 {d21}, [r0], r1
vld1.u8 {d21}, [r12], r1
;transpose to 8x16 matrix
vtrn.32 q3, q7
@@ -180,133 +168,11 @@
vtrn.8 q7, q8
vtrn.8 q9, q10
vld1.s8 {d4[], d5[]}, [r12] ; thresh
vld1.s8 {d2[], d3[]}, [r3] ; limit
mov r12, sp
vst1.u8 {q3}, [r12]!
vst1.u8 {q10}, [r12]!
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #4
add r2, r0, r1
add r3, r2, r1
vld1.u8 {q3}, [sp]!
vld1.u8 {q10}, [sp]!
;transpose to 16x8 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
add r12, r3, r1
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
;store op2, op1, op0, oq0, oq1, oq2
vst1.8 {d6}, [r0]
vst1.8 {d8}, [r2]
vst1.8 {d10}, [r3]
vst1.8 {d12}, [r12], r1
add r0, r12, r1
vst1.8 {d14}, [r12]
vst1.8 {d16}, [r0], r1
add r2, r0, r1
vst1.8 {d18}, [r0]
vst1.8 {d20}, [r2], r1
add r3, r2, r1
vst1.8 {d7}, [r2]
vst1.8 {d9}, [r3], r1
add r12, r3, r1
vst1.8 {d11}, [r3]
vst1.8 {d13}, [r12], r1
add r0, r12, r1
vst1.8 {d15}, [r12]
vst1.8 {d17}, [r0], r1
add r2, r0, r1
vst1.8 {d19}, [r0]
vst1.8 {d21}, [r2]
ldmia sp!, {pc}
ENDP ; |vp8_mbloop_filter_vertical_edge_y_neon|
; void vp8_mbloop_filter_vertical_edge_uv_neon(unsigned char *u, int pitch,
; const signed char *flimit,
; const signed char *limit,
; const signed char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 unsigned char *v
|vp8_mbloop_filter_vertical_edge_uv_neon| PROC
stmdb sp!, {lr}
sub r0, r0, #4 ; move src pointer down by 4 columns
vld1.s8 {d2[], d3[]}, [r3] ; limit
ldr r3, [sp, #8] ; load v ptr
ldr r12, [sp, #4] ; load thresh pointer
sub r3, r3, #4 ; move v pointer down by 4 columns
vld1.u8 {d6}, [r0], r1 ;load u data
vld1.u8 {d7}, [r3], r1 ;load v data
vld1.u8 {d8}, [r0], r1
vld1.u8 {d9}, [r3], r1
vld1.u8 {d10}, [r0], r1
vld1.u8 {d11}, [r3], r1
vld1.u8 {d12}, [r0], r1
vld1.u8 {d13}, [r3], r1
vld1.u8 {d14}, [r0], r1
vld1.u8 {d15}, [r3], r1
vld1.u8 {d16}, [r0], r1
vld1.u8 {d17}, [r3], r1
vld1.u8 {d18}, [r0], r1
vld1.u8 {d19}, [r3], r1
vld1.u8 {d20}, [r0], r1
vld1.u8 {d21}, [r3], r1
;transpose to 8x16 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
sub sp, sp, #32
vld1.s8 {d4[], d5[]}, [r12] ; thresh
mov r12, sp
vst1.u8 {q3}, [r12]!
vst1.u8 {q10}, [r12]!
bl vp8_mbloop_filter_neon
sub r0, r0, r1, lsl #3
sub r3, r3, r1, lsl #3
vld1.u8 {q3}, [sp]!
vld1.u8 {q10}, [sp]!
bl vp8_mbloop_filter_neon
sub r12, r12, r1, lsl #3
;transpose to 16x8 matrix
vtrn.32 q3, q7
@@ -326,23 +192,118 @@
;store op2, op1, op0, oq0, oq1, oq2
vst1.8 {d6}, [r0], r1
vst1.8 {d7}, [r3], r1
vst1.8 {d7}, [r12], r1
vst1.8 {d8}, [r0], r1
vst1.8 {d9}, [r3], r1
vst1.8 {d9}, [r12], r1
vst1.8 {d10}, [r0], r1
vst1.8 {d11}, [r3], r1
vst1.8 {d11}, [r12], r1
vst1.8 {d12}, [r0], r1
vst1.8 {d13}, [r3], r1
vst1.8 {d13}, [r12], r1
vst1.8 {d14}, [r0], r1
vst1.8 {d15}, [r3], r1
vst1.8 {d15}, [r12], r1
vst1.8 {d16}, [r0], r1
vst1.8 {d17}, [r3], r1
vst1.8 {d17}, [r12], r1
vst1.8 {d18}, [r0], r1
vst1.8 {d19}, [r3], r1
vst1.8 {d20}, [r0], r1
vst1.8 {d21}, [r3], r1
vst1.8 {d19}, [r12], r1
vst1.8 {d20}, [r0]
vst1.8 {d21}, [r12]
ldmia sp!, {pc}
pop {pc}
ENDP ; |vp8_mbloop_filter_vertical_edge_y_neon|
; void vp8_mbloop_filter_vertical_edge_uv_neon(unsigned char *u, int pitch,
; const unsigned char *blimit,
; const unsigned char *limit,
; const unsigned char *thresh,
; unsigned char *v)
; r0 unsigned char *u,
; r1 int pitch,
; r2 const signed char *flimit,
; r3 const signed char *limit,
; sp const signed char *thresh,
; sp+4 unsigned char *v
|vp8_mbloop_filter_vertical_edge_uv_neon| PROC
push {lr}
ldr r12, [sp, #4] ; load thresh
sub r0, r0, #4 ; move u pointer down by 4 columns
vdup.u8 q2, r12 ; thresh
ldr r12, [sp, #8] ; load v ptr
sub r12, r12, #4 ; move v pointer down by 4 columns
vld1.u8 {d6}, [r0], r1 ;load u data
vld1.u8 {d7}, [r12], r1 ;load v data
vld1.u8 {d8}, [r0], r1
vld1.u8 {d9}, [r12], r1
vld1.u8 {d10}, [r0], r1
vld1.u8 {d11}, [r12], r1
vld1.u8 {d12}, [r0], r1
vld1.u8 {d13}, [r12], r1
vld1.u8 {d14}, [r0], r1
vld1.u8 {d15}, [r12], r1
vld1.u8 {d16}, [r0], r1
vld1.u8 {d17}, [r12], r1
vld1.u8 {d18}, [r0], r1
vld1.u8 {d19}, [r12], r1
vld1.u8 {d20}, [r0], r1
vld1.u8 {d21}, [r12], r1
;transpose to 8x16 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
sub r0, r0, r1, lsl #3
bl vp8_mbloop_filter_neon
sub r12, r12, r1, lsl #3
;transpose to 16x8 matrix
vtrn.32 q3, q7
vtrn.32 q4, q8
vtrn.32 q5, q9
vtrn.32 q6, q10
vtrn.16 q3, q5
vtrn.16 q4, q6
vtrn.16 q7, q9
vtrn.16 q8, q10
vtrn.8 q3, q4
vtrn.8 q5, q6
vtrn.8 q7, q8
vtrn.8 q9, q10
;store op2, op1, op0, oq0, oq1, oq2
vst1.8 {d6}, [r0], r1
vst1.8 {d7}, [r12], r1
vst1.8 {d8}, [r0], r1
vst1.8 {d9}, [r12], r1
vst1.8 {d10}, [r0], r1
vst1.8 {d11}, [r12], r1
vst1.8 {d12}, [r0], r1
vst1.8 {d13}, [r12], r1
vst1.8 {d14}, [r0], r1
vst1.8 {d15}, [r12], r1
vst1.8 {d16}, [r0], r1
vst1.8 {d17}, [r12], r1
vst1.8 {d18}, [r0], r1
vst1.8 {d19}, [r12], r1
vst1.8 {d20}, [r0]
vst1.8 {d21}, [r12]
pop {pc}
ENDP ; |vp8_mbloop_filter_vertical_edge_uv_neon|
; void vp8_mbloop_filter_neon()
@@ -350,41 +311,33 @@
; functions do the necessary load, transpose (if necessary), preserve (if
; necessary) and store.
; TODO:
; The vertical filter writes p3/q3 back out because two 4 element writes are
; much simpler than ordering and writing two 3 element sets (or three 2 elements
; sets, or whichever other combinations are possible).
; If we can preserve q3 and q10, the vertical filter will be able to avoid
; storing those values on the stack and reading them back after the filter.
; r0,r1 PRESERVE
; r2 flimit
; r3 PRESERVE
; q1 limit
; r2 mblimit
; r3 limit
; q2 thresh
; q3 p3
; q3 p3 PRESERVE
; q4 p2
; q5 p1
; q6 p0
; q7 q0
; q8 q1
; q9 q2
; q10 q3
; q10 q3 PRESERVE
|vp8_mbloop_filter_neon| PROC
ldr r12, _mblf_coeff_
; vp8_filter_mask
vabd.u8 q11, q3, q4 ; abs(p3 - p2)
vabd.u8 q12, q4, q5 ; abs(p2 - p1)
vabd.u8 q13, q5, q6 ; abs(p1 - p0)
vabd.u8 q14, q8, q7 ; abs(q1 - q0)
vabd.u8 q3, q9, q8 ; abs(q2 - q1)
vabd.u8 q1, q9, q8 ; abs(q2 - q1)
vabd.u8 q0, q10, q9 ; abs(q3 - q2)
vmax.u8 q11, q11, q12
vmax.u8 q12, q13, q14
vmax.u8 q3, q3, q0
vmax.u8 q1, q1, q0
vmax.u8 q15, q11, q12
vabd.u8 q12, q6, q7 ; abs(p0 - q0)
@@ -392,51 +345,53 @@
; vp8_hevmask
vcgt.u8 q13, q13, q2 ; (abs(p1 - p0) > thresh) * -1
vcgt.u8 q14, q14, q2 ; (abs(q1 - q0) > thresh) * -1
vmax.u8 q15, q15, q3
vmax.u8 q15, q15, q1
vld1.s8 {d4[], d5[]}, [r2] ; flimit
vdup.u8 q1, r3 ; limit
vdup.u8 q2, r2 ; mblimit
vld1.u8 {q0}, [r12]!
vmov.u8 q0, #0x80 ; 0x80
vadd.u8 q2, q2, q2 ; flimit * 2
vadd.u8 q2, q2, q1 ; flimit * 2 + limit
vcge.u8 q15, q1, q15
vabd.u8 q1, q5, q8 ; a = abs(p1 - q1)
vqadd.u8 q12, q12, q12 ; b = abs(p0 - q0) * 2
vshr.u8 q1, q1, #1 ; a = a / 2
vqadd.u8 q12, q12, q1 ; a = b + a
vcge.u8 q12, q2, q12 ; (a > flimit * 2 + limit) * -1
vmov.u16 q11, #3 ; #3
; vp8_filter
; convert to signed
veor q7, q7, q0 ; qs0
vshr.u8 q1, q1, #1 ; a = a / 2
veor q6, q6, q0 ; ps0
veor q5, q5, q0 ; ps1
vqadd.u8 q12, q12, q1 ; a = b + a
veor q8, q8, q0 ; qs1
veor q4, q4, q0 ; ps2
veor q9, q9, q0 ; qs2
vorr q14, q13, q14 ; vp8_hevmask
vcge.u8 q12, q2, q12 ; (a > flimit * 2 + limit) * -1
vsubl.s8 q2, d14, d12 ; qs0 - ps0
vsubl.s8 q13, d15, d13
vqsub.s8 q1, q5, q8 ; vp8_filter = clamp(ps1-qs1)
vadd.s16 q10, q2, q2 ; 3 * (qs0 - ps0)
vadd.s16 q11, q13, q13
vmul.i16 q2, q2, q11 ; 3 * ( qs0 - ps0)
vand q15, q15, q12 ; vp8_filter_mask
vadd.s16 q2, q2, q10
vadd.s16 q13, q13, q11
vmul.i16 q13, q13, q11
vld1.u8 {q12}, [r12]! ; #3
vmov.u8 q12, #3 ; #3
vaddw.s8 q2, q2, d2 ; vp8_filter + 3 * ( qs0 - ps0)
vaddw.s8 q13, q13, d3
vld1.u8 {q11}, [r12]! ; #4
vmov.u8 q11, #4 ; #4
; vp8_filter = clamp(vp8_filter + 3 * ( qs0 - ps0))
vqmovn.s16 d2, q2
@@ -444,27 +399,23 @@
vand q1, q1, q15 ; vp8_filter &= mask
vld1.u8 {q15}, [r12]! ; #63
;
vand q13, q1, q14 ; Filter2 &= hev
vmov.u16 q15, #63 ; #63
vld1.u8 {d7}, [r12]! ; #9
vand q13, q1, q14 ; Filter2 &= hev
vqadd.s8 q2, q13, q11 ; Filter1 = clamp(Filter2+4)
vqadd.s8 q13, q13, q12 ; Filter2 = clamp(Filter2+3)
vld1.u8 {d6}, [r12]! ; #18
vmov q0, q15
vshr.s8 q2, q2, #3 ; Filter1 >>= 3
vshr.s8 q13, q13, #3 ; Filter2 >>= 3
vmov q10, q15
vmov q11, q15
vmov q12, q15
vqsub.s8 q7, q7, q2 ; qs0 = clamp(qs0 - Filter1)
vld1.u8 {d5}, [r12]! ; #27
vqadd.s8 q6, q6, q13 ; ps0 = clamp(ps0 + Filter2)
vbic q1, q1, q14 ; vp8_filter &= ~hev
@@ -472,48 +423,47 @@
; roughly 1/7th difference across boundary
; roughly 2/7th difference across boundary
; roughly 3/7th difference across boundary
vmov q11, q15
vmov.u8 d5, #9 ; #9
vmov.u8 d4, #18 ; #18
vmov q13, q15
vmov q14, q15
vmlal.s8 q10, d2, d7 ; Filter2 * 9
vmlal.s8 q11, d3, d7
vmlal.s8 q12, d2, d6 ; Filter2 * 18
vmlal.s8 q13, d3, d6
vmlal.s8 q14, d2, d5 ; Filter2 * 27
vmlal.s8 q0, d2, d5 ; 63 + Filter2 * 9
vmlal.s8 q11, d3, d5
vmov.u8 d5, #27 ; #27
vmlal.s8 q12, d2, d4 ; 63 + Filter2 * 18
vmlal.s8 q13, d3, d4
vmlal.s8 q14, d2, d5 ; 63 + Filter2 * 27
vmlal.s8 q15, d3, d5
vqshrn.s16 d20, q10, #7 ; u = clamp((63 + Filter2 * 9)>>7)
vqshrn.s16 d21, q11, #7
vqshrn.s16 d0, q0, #7 ; u = clamp((63 + Filter2 * 9)>>7)
vqshrn.s16 d1, q11, #7
vqshrn.s16 d24, q12, #7 ; u = clamp((63 + Filter2 * 18)>>7)
vqshrn.s16 d25, q13, #7
vqshrn.s16 d28, q14, #7 ; u = clamp((63 + Filter2 * 27)>>7)
vqshrn.s16 d29, q15, #7
vqsub.s8 q11, q9, q10 ; s = clamp(qs2 - u)
vqadd.s8 q10, q4, q10 ; s = clamp(ps2 + u)
vmov.u8 q1, #0x80 ; 0x80
vqsub.s8 q11, q9, q0 ; s = clamp(qs2 - u)
vqadd.s8 q0, q4, q0 ; s = clamp(ps2 + u)
vqsub.s8 q13, q8, q12 ; s = clamp(qs1 - u)
vqadd.s8 q12, q5, q12 ; s = clamp(ps1 + u)
vqsub.s8 q15, q7, q14 ; s = clamp(qs0 - u)
vqadd.s8 q14, q6, q14 ; s = clamp(ps0 + u)
veor q9, q11, q0 ; *oq2 = s^0x80
veor q4, q10, q0 ; *op2 = s^0x80
veor q8, q13, q0 ; *oq1 = s^0x80
veor q5, q12, q0 ; *op2 = s^0x80
veor q7, q15, q0 ; *oq0 = s^0x80
veor q6, q14, q0 ; *op0 = s^0x80
veor q9, q11, q1 ; *oq2 = s^0x80
veor q4, q0, q1 ; *op2 = s^0x80
veor q8, q13, q1 ; *oq1 = s^0x80
veor q5, q12, q1 ; *op2 = s^0x80
veor q7, q15, q1 ; *oq0 = s^0x80
veor q6, q14, q1 ; *op0 = s^0x80
bx lr
ENDP ; |vp8_mbloop_filter_neon|
AREA mbloopfilter_dat, DATA, READONLY
_mblf_coeff_
DCD mblf_coeff
mblf_coeff
DCD 0x80808080, 0x80808080, 0x80808080, 0x80808080
DCD 0x03030303, 0x03030303, 0x03030303, 0x03030303
DCD 0x04040404, 0x04040404, 0x04040404, 0x04040404
DCD 0x003f003f, 0x003f003f, 0x003f003f, 0x003f003f
DCD 0x09090909, 0x09090909, 0x12121212, 0x12121212
DCD 0x1b1b1b1b, 0x1b1b1b1b
;-----------------
END

View File

@@ -10,8 +10,8 @@
#include "vpx_ports/config.h"
#include "recon.h"
#include "blockd.h"
#include "vp8/common/recon.h"
#include "vp8/common/blockd.h"
extern void vp8_recon16x16mb_neon(unsigned char *pred_ptr, short *diff_ptr, unsigned char *dst_ptr, int ystride, unsigned char *udst_ptr, unsigned char *vdst_ptr);

View File

@@ -31,7 +31,7 @@
;result of the multiplication that is needed in IDCT.
|vp8_short_idct4x4llm_neon| PROC
ldr r12, _idct_coeff_
adr r12, idct_coeff
vld1.16 {q1, q2}, [r0]
vld1.16 {d0}, [r12]
@@ -113,12 +113,7 @@
ENDP
;-----------------
AREA idct4x4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_idct_coeff_
DCD idct_coeff
idct_coeff
DCD 0x4e7b4e7b, 0x8a8c8a8c

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter16_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -33,7 +44,7 @@
|vp8_sixtap_predict16x16_neon| PROC
push {r4-r5, lr}
ldr r12, _filter16_coeff_
adr r12, filter16_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -476,20 +487,4 @@ secondpass_only_inner_loop_neon
ENDP
;-----------------
AREA subpelfilters16_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter16_coeff_
DCD filter16_coeff
filter16_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter4_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -25,7 +36,7 @@
|vp8_sixtap_predict_neon| PROC
push {r4, lr}
ldr r12, _filter4_coeff_
adr r12, filter4_coeff
ldr r4, [sp, #8] ;load parameters from stack
ldr lr, [sp, #12] ;load parameters from stack
@@ -407,20 +418,5 @@ secondpass_filter4x4_only
ENDP
;-----------------
AREA subpelfilters4_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter4_coeff_
DCD filter4_coeff
filter4_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -25,7 +36,7 @@
|vp8_sixtap_predict8x4_neon| PROC
push {r4-r5, lr}
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -458,20 +469,5 @@ secondpass_filter8x4_only
ENDP
;-----------------
AREA subpelfilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_
DCD filter8_coeff
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -15,6 +15,17 @@
PRESERVE8
AREA ||.text||, CODE, READONLY, ALIGN=2
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
; r0 unsigned char *src_ptr,
; r1 int src_pixels_per_line,
; r2 int xoffset,
@@ -25,7 +36,7 @@
|vp8_sixtap_predict8x8_neon| PROC
push {r4-r5, lr}
ldr r12, _filter8_coeff_
adr r12, filter8_coeff
ldr r4, [sp, #12] ;load parameters from stack
ldr r5, [sp, #16] ;load parameters from stack
@@ -509,20 +520,5 @@ filt_blk2d_spo8x8_loop_neon
ENDP
;-----------------
AREA subpelfilters8_dat, DATA, READWRITE ;read/write by default
;Data section with name data_area is specified. DCD reserves space in memory for 48 data.
;One word each is reserved. Label filter_coeff can be used to access the data.
;Data address: filter_coeff, filter_coeff+4, filter_coeff+8 ...
_filter8_coeff_
DCD filter8_coeff
filter8_coeff
DCD 0, 0, 128, 0, 0, 0, 0, 0
DCD 0, -6, 123, 12, -1, 0, 0, 0
DCD 2, -11, 108, 36, -8, 1, 0, 0
DCD 0, -9, 93, 50, -6, 0, 0, 0
DCD 3, -16, 77, 77, -16, 3, 0, 0
DCD 0, -6, 50, 93, -9, 0, 0, 0
DCD 1, -8, 36, 108, -11, 2, 0, 0
DCD 0, -1, 12, 123, -6, 0, 0, 0
END

View File

@@ -53,6 +53,9 @@ extern prototype_copy_block(vp8_copy_mem16x16_neon);
extern prototype_recon_macroblock(vp8_recon_mb_neon);
extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_neon);
extern prototype_build_intra_predictors(vp8_build_intra_predictors_mby_s_neon);
#if !CONFIG_RUNTIME_CPU_DETECT
#undef vp8_recon_recon
#define vp8_recon_recon vp8_recon_b_neon
@@ -74,6 +77,13 @@ extern prototype_recon_macroblock(vp8_recon_mb_neon);
#undef vp8_recon_recon_mb
#define vp8_recon_recon_mb vp8_recon_mb_neon
#undef vp8_recon_build_intra_predictors_mby
#define vp8_recon_build_intra_predictors_mby vp8_build_intra_predictors_mby_neon
#undef vp8_recon_build_intra_predictors_mby_s
#define vp8_recon_build_intra_predictors_mby_s vp8_build_intra_predictors_mby_s_neon
#endif
#endif

View File

@@ -10,10 +10,10 @@
#include "vpx_ports/config.h"
#include "blockd.h"
#include "reconintra.h"
#include "vp8/common/blockd.h"
#include "vp8/common/reconintra.h"
#include "vpx_mem/vpx_mem.h"
#include "recon.h"
#include "vp8/common/recon.h"
#if HAVE_ARMV7
extern void vp8_build_intra_predictors_mby_neon_func(

View File

@@ -1,87 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_ports/config.h"
#include <stddef.h>
#if CONFIG_VP8_ENCODER
#include "vpx_scale/yv12config.h"
#endif
#if CONFIG_VP8_DECODER
#include "onyxd_int.h"
#endif
#define DEFINE(sym, val) int sym = val;
/*
#define BLANK() asm volatile("\n->" : : )
*/
/*
* int main(void)
* {
*/
#if CONFIG_VP8_DECODER || CONFIG_VP8_ENCODER
DEFINE(yv12_buffer_config_y_width, offsetof(YV12_BUFFER_CONFIG, y_width));
DEFINE(yv12_buffer_config_y_height, offsetof(YV12_BUFFER_CONFIG, y_height));
DEFINE(yv12_buffer_config_y_stride, offsetof(YV12_BUFFER_CONFIG, y_stride));
DEFINE(yv12_buffer_config_uv_width, offsetof(YV12_BUFFER_CONFIG, uv_width));
DEFINE(yv12_buffer_config_uv_height, offsetof(YV12_BUFFER_CONFIG, uv_height));
DEFINE(yv12_buffer_config_uv_stride, offsetof(YV12_BUFFER_CONFIG, uv_stride));
DEFINE(yv12_buffer_config_y_buffer, offsetof(YV12_BUFFER_CONFIG, y_buffer));
DEFINE(yv12_buffer_config_u_buffer, offsetof(YV12_BUFFER_CONFIG, u_buffer));
DEFINE(yv12_buffer_config_v_buffer, offsetof(YV12_BUFFER_CONFIG, v_buffer));
DEFINE(yv12_buffer_config_border, offsetof(YV12_BUFFER_CONFIG, border));
#endif
#if CONFIG_VP8_DECODER
DEFINE(mb_diff, offsetof(MACROBLOCKD, diff));
DEFINE(mb_predictor, offsetof(MACROBLOCKD, predictor));
DEFINE(mb_dst_y_stride, offsetof(MACROBLOCKD, dst.y_stride));
DEFINE(mb_dst_y_buffer, offsetof(MACROBLOCKD, dst.y_buffer));
DEFINE(mb_dst_u_buffer, offsetof(MACROBLOCKD, dst.u_buffer));
DEFINE(mb_dst_v_buffer, offsetof(MACROBLOCKD, dst.v_buffer));
DEFINE(mb_up_available, offsetof(MACROBLOCKD, up_available));
DEFINE(mb_left_available, offsetof(MACROBLOCKD, left_available));
DEFINE(detok_scan, offsetof(DETOK, scan));
DEFINE(detok_ptr_block2leftabove, offsetof(DETOK, ptr_block2leftabove));
DEFINE(detok_coef_tree_ptr, offsetof(DETOK, vp8_coef_tree_ptr));
DEFINE(detok_teb_base_ptr, offsetof(DETOK, teb_base_ptr));
DEFINE(detok_norm_ptr, offsetof(DETOK, norm_ptr));
DEFINE(detok_ptr_coef_bands_x, offsetof(DETOK, ptr_coef_bands_x));
DEFINE(detok_A, offsetof(DETOK, A));
DEFINE(detok_L, offsetof(DETOK, L));
DEFINE(detok_qcoeff_start_ptr, offsetof(DETOK, qcoeff_start_ptr));
DEFINE(detok_current_bc, offsetof(DETOK, current_bc));
DEFINE(detok_coef_probs, offsetof(DETOK, coef_probs));
DEFINE(detok_eob, offsetof(DETOK, eob));
DEFINE(bool_decoder_user_buffer_end, offsetof(BOOL_DECODER, user_buffer_end));
DEFINE(bool_decoder_user_buffer, offsetof(BOOL_DECODER, user_buffer));
DEFINE(bool_decoder_value, offsetof(BOOL_DECODER, value));
DEFINE(bool_decoder_count, offsetof(BOOL_DECODER, count));
DEFINE(bool_decoder_range, offsetof(BOOL_DECODER, range));
DEFINE(tokenextrabits_min_val, offsetof(TOKENEXTRABITS, min_val));
DEFINE(tokenextrabits_length, offsetof(TOKENEXTRABITS, Length));
#endif
//add asserts for any offset that is not supported by assembly code
//add asserts for any size that is not supported by assembly code
/*
* return 0;
* }
*/

View File

@@ -0,0 +1,40 @@
/*
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "vpx_config.h"
#include "vpx/vpx_codec.h"
#include "vpx_ports/asm_offsets.h"
#include "vpx_scale/yv12config.h"
BEGIN
/* vpx_scale */
DEFINE(yv12_buffer_config_y_width, offsetof(YV12_BUFFER_CONFIG, y_width));
DEFINE(yv12_buffer_config_y_height, offsetof(YV12_BUFFER_CONFIG, y_height));
DEFINE(yv12_buffer_config_y_stride, offsetof(YV12_BUFFER_CONFIG, y_stride));
DEFINE(yv12_buffer_config_uv_width, offsetof(YV12_BUFFER_CONFIG, uv_width));
DEFINE(yv12_buffer_config_uv_height, offsetof(YV12_BUFFER_CONFIG, uv_height));
DEFINE(yv12_buffer_config_uv_stride, offsetof(YV12_BUFFER_CONFIG, uv_stride));
DEFINE(yv12_buffer_config_y_buffer, offsetof(YV12_BUFFER_CONFIG, y_buffer));
DEFINE(yv12_buffer_config_u_buffer, offsetof(YV12_BUFFER_CONFIG, u_buffer));
DEFINE(yv12_buffer_config_v_buffer, offsetof(YV12_BUFFER_CONFIG, v_buffer));
DEFINE(yv12_buffer_config_border, offsetof(YV12_BUFFER_CONFIG, border));
DEFINE(VP8BORDERINPIXELS_VAL, VP8BORDERINPIXELS);
END
/* add asserts for any offset that is not supported by assembly code */
/* add asserts for any size that is not supported by assembly code */
#if HAVE_ARMV7
/* vp8_yv12_extend_frame_borders_neon makes several assumptions based on this */
ct_assert(VP8BORDERINPIXELS_VAL, VP8BORDERINPIXELS == 32)
#endif

View File

@@ -12,8 +12,6 @@
#include "blockd.h"
#include "vpx_mem/vpx_mem.h"
const int vp8_block2type[25] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 1};
const unsigned char vp8_block2left[25] =
{
0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8

View File

@@ -28,11 +28,6 @@ void vpx_log(const char *format, ...);
#define DCPREDSIMTHRESH 0
#define DCPREDCNTTHRESH 3
#define Y1CONTEXT 0
#define UCONTEXT 1
#define VCONTEXT 2
#define Y2CONTEXT 3
#define MB_FEATURE_TREE_PROBS 3
#define MAX_MB_SEGMENTS 4
@@ -48,6 +43,11 @@ typedef struct
int r, c;
} POS;
#define PLANE_TYPE_Y_NO_DC 0
#define PLANE_TYPE_Y2 1
#define PLANE_TYPE_UV 2
#define PLANE_TYPE_Y_WITH_DC 3
typedef char ENTROPY_CONTEXT;
typedef struct
@@ -58,8 +58,6 @@ typedef struct
ENTROPY_CONTEXT y2;
} ENTROPY_CONTEXT_PLANES;
extern const int vp8_block2type[25];
extern const unsigned char vp8_block2left[25];
extern const unsigned char vp8_block2above[25];
@@ -139,16 +137,11 @@ typedef enum
modes for the Y blocks to the left and above us; for interframes, there
is a single probability table. */
typedef struct
union b_mode_info
{
B_PREDICTION_MODE mode;
union
{
int as_int;
MV as_mv;
} mv;
} B_MODE_INFO;
B_PREDICTION_MODE as_mode;
int_mv mv;
};
typedef enum
{
@@ -163,38 +156,26 @@ typedef struct
{
MB_PREDICTION_MODE mode, uv_mode;
MV_REFERENCE_FRAME ref_frame;
union
{
int as_int;
MV as_mv;
} mv;
int_mv mv;
unsigned char partitioning;
unsigned char mb_skip_coeff; /* does this mb has coefficients at all, 1=no coefficients, 0=need decode tokens */
unsigned char dc_diff;
unsigned char need_to_clamp_mvs;
unsigned char segment_id; /* Which set of segmentation parameters should be used for this MB */
unsigned char force_no_skip; /* encoder only */
} MB_MODE_INFO;
typedef struct
{
MB_MODE_INFO mbmi;
B_MODE_INFO bmi[16];
union b_mode_info bmi[16];
} MODE_INFO;
typedef struct
{
short *qcoeff;
short *dqcoeff;
unsigned char *predictor;
short *diff;
short *reference;
short *dequant;
/* 16 Y blocks, 4 U blocks, 4 V blocks each with 16 entries */
@@ -208,15 +189,13 @@ typedef struct
int eob;
B_MODE_INFO bmi;
union b_mode_info bmi;
} BLOCKD;
typedef struct
typedef struct MacroBlockD
{
DECLARE_ALIGNED(16, short, diff[400]); /* from idct diff */
DECLARE_ALIGNED(16, unsigned char, predictor[384]);
/* not used DECLARE_ALIGNED(16, short, reference[384]); */
DECLARE_ALIGNED(16, short, qcoeff[400]);
DECLARE_ALIGNED(16, short, dqcoeff[400]);
DECLARE_ALIGNED(16, char, eobs[25]);
@@ -273,6 +252,9 @@ typedef struct
int mb_to_top_edge;
int mb_to_bottom_edge;
int ref_frame_cost[MAX_REF_FRAMES];
unsigned int frames_since_golden;
unsigned int frames_till_alt_ref_frame;
vp8_subpix_fn_t subpixel_predict;
@@ -282,6 +264,16 @@ typedef struct
void *current_bc;
int corrupted;
#if ARCH_X86 || ARCH_X86_64
/* This is an intermediate buffer currently used in sub-pixel motion search
* to keep a copy of the reference area. This buffer can be used for other
* purpose.
*/
DECLARE_ALIGNED(32, unsigned char, y_buf[22*32]);
#endif
#if CONFIG_RUNTIME_CPU_DETECT
struct VP8_COMMON_RTCD *rtcd;
#endif
@@ -291,4 +283,20 @@ typedef struct
extern void vp8_build_block_doffsets(MACROBLOCKD *x);
extern void vp8_setup_block_dptrs(MACROBLOCKD *x);
static void update_blockd_bmi(MACROBLOCKD *xd)
{
int i;
int is_4x4;
is_4x4 = (xd->mode_info_context->mbmi.mode == SPLITMV) ||
(xd->mode_info_context->mbmi.mode == B_PRED);
if (is_4x4)
{
for (i = 0; i < 16; i++)
{
xd->block[i].bmi = xd->mode_info_context->bmi[i];
}
}
}
#endif /* __INC_BLOCKD_H */

View File

@@ -1,570 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef bool_coder_h
#define bool_coder_h 1
/* Arithmetic bool coder with largish probability range.
Timothy S Murphy 6 August 2004 */
/* So as not to force users to drag in too much of my idiosyncratic C++ world,
I avoid fancy storage management. */
#include <assert.h>
#include <stddef.h>
#include <stdio.h>
typedef unsigned char vp8bc_index_t; // probability index
/* There are a couple of slight variants in the details of finite-precision
arithmetic coding. May be safely ignored by most users. */
enum vp8bc_rounding
{
vp8bc_down = 0, // just like VP8
vp8bc_down_full = 1, // handles minimum probability correctly
vp8bc_up = 2
};
#if _MSC_VER
/* Note that msvc by default does not inline _anything_ (regardless of the
setting of inline_depth) and that a command-line option (-Ob1 or -Ob2)
is required to inline even the smallest functions. */
# pragma inline_depth( 255) // I mean it when I inline something
# pragma warning( disable : 4099) // No class vs. struct harassment
# pragma warning( disable : 4250) // dominance complaints
# pragma warning( disable : 4284) // operator-> in templates
# pragma warning( disable : 4800) // bool conversion
// don't let prefix ++,-- stand in for postfix, disaster would ensue
# pragma warning( error : 4620 4621)
#endif // _MSC_VER
#if __cplusplus
// Sometimes one wishes to be definite about integer lengths.
struct int_types
{
typedef const bool cbool;
typedef const signed char cchar;
typedef const short cshort;
typedef const int cint;
typedef const int clong;
typedef const double cdouble;
typedef const size_t csize_t;
typedef unsigned char uchar; // 8 bits
typedef const uchar cuchar;
typedef short int16;
typedef unsigned short uint16;
typedef const int16 cint16;
typedef const uint16 cuint16;
typedef int int32;
typedef unsigned int uint32;
typedef const int32 cint32;
typedef const uint32 cuint32;
typedef unsigned int uint;
typedef unsigned int ulong;
typedef const uint cuint;
typedef const ulong culong;
// All structs consume space, may as well have a vptr.
virtual ~int_types();
};
struct bool_coder_spec;
struct bool_coder;
struct bool_writer;
struct bool_reader;
struct bool_coder_namespace : int_types
{
typedef vp8bc_index_t Index;
typedef bool_coder_spec Spec;
typedef const Spec c_spec;
enum Rounding
{
Down = vp8bc_down,
down_full = vp8bc_down_full,
Up = vp8bc_up
};
};
// Archivable specification of a bool coder includes rounding spec
// and probability mapping table. The latter replaces a uchar j
// (0 <= j < 256) with an arbitrary uint16 tbl[j] = p.
// p/65536 is then the probability of a zero.
struct bool_coder_spec : bool_coder_namespace
{
friend struct bool_coder;
friend struct bool_writer;
friend struct bool_reader;
friend struct bool_coder_spec_float;
friend struct bool_coder_spec_explicit_table;
friend struct bool_coder_spec_exponential_table;
friend struct BPsrc;
private:
uint w; // precision
Rounding r;
uint ebits, mbits, ebias;
uint32 mmask;
Index max_index, half_index;
uint32 mantissa(Index i) const
{
assert(i < half_index);
return (1 << mbits) + (i & mmask);
}
uint exponent(Index i) const
{
assert(i < half_index);
return ebias - (i >> mbits);
}
uint16 Ptbl[256]; // kinda clunky, but so is storage management.
/* Cost in bits of encoding a zero at every probability, scaled by 2^20.
Assumes that index is at most 8 bits wide. */
uint32 Ctbl[256];
uint32 split(Index i, uint32 R) const // 1 <= split <= max( 1, R-1)
{
if (!ebias)
return 1 + (((R - 1) * Ptbl[i]) >> 16);
if (i >= half_index)
return R - split(max_index - i, R);
return 1 + (((R - 1) * mantissa(i)) >> exponent(i));
}
uint32 max_range() const
{
return (1 << w) - (r == down_full ? 0 : 1);
}
uint32 min_range() const
{
return (1 << (w - 1)) + (r == down_full ? 1 : 0);
}
uint32 Rinc() const
{
return r == Up ? 1 : 0;
}
void check_prec() const;
bool float_init(uint Ebits, uint Mbits);
void cost_init();
bool_coder_spec(
uint prec, Rounding rr, uint Ebits = 0, uint Mbits = 0
)
: w(prec), r(rr)
{
float_init(Ebits, Mbits);
}
public:
// Read complete spec from file.
bool_coder_spec(FILE *);
// Write spec to file.
void dump(FILE *) const;
// return probability index best approximating prob.
Index operator()(double prob) const;
// probability corresponding to index
double operator()(Index i) const;
Index complement(Index i) const
{
return max_index - i;
}
Index max_index() const
{
return max_index;
}
Index half_index() const
{
return half_index;
}
uint32 cost_zero(Index i) const
{
return Ctbl[i];
}
uint32 cost_one(Index i) const
{
return Ctbl[ max_index - i];
}
uint32 cost_bit(Index i, bool b) const
{
return Ctbl[b? max_index-i:i];
}
};
/* Pseudo floating-point probability specification.
At least one of Ebits and Mbits must be nonzero.
Since all arithmetic is done at 32 bits, Ebits is at most 5.
Total significant bits in index is Ebits + Mbits + 1.
Below the halfway point (i.e. when the top significant bit is 0),
the index is (e << Mbits) + m.
The exponent e is between 0 and (2**Ebits) - 1,
the mantissa m is between 0 and (2**Mbits) - 1.
Prepending an implicit 1 to the mantissa, the probability is then
(2**Mbits + m) >> (e - 2**Ebits - 1 - Mbits),
which has (1/2)**(2**Ebits + 1) as a minimum
and (1/2) * [1 - 2**(Mbits + 1)] as a maximum.
When the index is above the halfway point, the probability is the
complement of the probability associated to the complement of the index.
Note that the probability increases with the index and that, because of
the symmetry, we cannot encode probability exactly 1/2; though we
can get as close to 1/2 as we like, provided we have enough Mbits.
The latter is of course not a problem in practice, one never has
exact probabilities and entropy errors are second order, that is, the
"overcoding" of a zero will be largely compensated for by the
"undercoding" of a one (or vice-versa).
Compared to arithmetic probability specs (a la VP8), this will do better
at very high and low probabilities and worse at probabilities near 1/2,
as well as facilitating the usage of wider or narrower probability indices.
*/
struct bool_coder_spec_float : bool_coder_spec
{
bool_coder_spec_float(
uint Ebits = 3, uint Mbits = 4, Rounding rr = down_full, uint prec = 12
)
: bool_coder_spec(prec, rr, Ebits, Mbits)
{
cost_init();
}
};
struct bool_coder_spec_explicit_table : bool_coder_spec
{
bool_coder_spec_explicit_table(
cuint16 probability_table[256] = 0, // default is tbl[i] = i << 8.
Rounding = down_full,
uint precision = 16
);
};
// Contruct table via multiplicative interpolation between
// p[128] = 1/2 and p[0] = (1/2)^x.
// Since we are working with 16-bit precision, x is at most 16.
// For probabilities to increase with i, we must have x > 1.
// For 0 <= i <= 128, p[i] = (1/2)^{ 1 + [1 - (i/128)]*[x-1] }.
// Finally, p[128+i] = 1 - p[128 - i].
struct bool_coder_spec_exponential_table : bool_coder_spec
{
bool_coder_spec_exponential_table(uint x, Rounding = down_full, uint prec = 16);
};
// Commonalities between writer and reader.
struct bool_coder : bool_coder_namespace
{
friend struct bool_writer;
friend struct bool_reader;
friend struct BPsrc;
private:
uint32 Low, Range;
cuint32 min_range;
cuint32 rinc;
c_spec spec;
void _reset()
{
Low = 0;
Range = spec.max_range();
}
bool_coder(c_spec &s)
: min_range(s.min_range()),
rinc(s.Rinc()),
spec(s)
{
_reset();
}
uint32 half() const
{
return 1 + ((Range - 1) >> 1);
}
public:
c_spec &Spec() const
{
return spec;
}
};
struct bool_writer : bool_coder
{
friend struct BPsrc;
private:
uchar *Bstart, *Bend, *B;
int bit_lag;
bool is_toast;
void carry();
void reset()
{
_reset();
bit_lag = 32 - spec.w;
is_toast = 0;
}
void raw(bool value, uint32 split);
public:
bool_writer(c_spec &, uchar *Dest, size_t Len);
virtual ~bool_writer();
void operator()(Index p, bool v)
{
raw(v, spec.split(p, Range));
}
uchar *buf() const
{
return Bstart;
}
size_t bytes_written() const
{
return B - Bstart;
}
// Call when done with input, flushes internal state.
// DO NOT write any more data after calling this.
bool_writer &flush();
void write_bits(int n, uint val)
{
if (n)
{
uint m = 1 << (n - 1);
do
{
raw((bool)(val & m), half());
}
while (m >>= 1);
}
}
# if 0
// We are agnostic about storage management.
// By default, overflows throw an assert but user can
// override to provide an expanding buffer using ...
virtual void overflow(uint Len) const;
// ... this function copies already-written data into new buffer
// and retains new buffer location.
void new_buffer(uchar *dest, uint Len);
// Note that storage management is the user's responsibility.
# endif
};
// This could be adjusted to use a little less lookahead.
struct bool_reader : bool_coder
{
friend struct BPsrc;
private:
cuchar *const Bstart; // for debugging
cuchar *B;
cuchar *const Bend;
cuint shf;
uint bct;
bool raw(uint32 split);
public:
bool_reader(c_spec &s, cuchar *src, size_t Len);
bool operator()(Index p)
{
return raw(spec.split(p, Range));
}
uint read_bits(int num_bits)
{
uint v = 0;
while (--num_bits >= 0)
v += v + (raw(half()) ? 1 : 0);
return v;
}
};
extern "C" {
#endif /* __cplusplus */
/* C interface */
typedef struct bool_coder_spec bool_coder_spec;
typedef struct bool_writer bool_writer;
typedef struct bool_reader bool_reader;
typedef const bool_coder_spec c_bool_coder_spec;
typedef const bool_writer c_bool_writer;
typedef const bool_reader c_bool_reader;
/* Optionally override default precision when constructing coder_specs.
Just pass a zero pointer if you don't care.
Precision is at most 16 bits for table specs, at most 23 otherwise. */
struct vp8bc_prec
{
enum vp8bc_rounding r; /* see top header file for def */
unsigned int prec; /* range precision in bits */
};
typedef const struct vp8bc_prec vp8bc_c_prec;
/* bool_coder_spec contains mapping of uchars to actual probabilities
(16 bit uints) as well as (usually immaterial) selection of
exact finite-precision algorithm used (for now, the latter can only
be overridden using the C++ interface).
See comments above the corresponding C++ constructors for discussion,
especially of exponential probability table generation. */
bool_coder_spec *vp8bc_vp8spec(); // just like vp8
bool_coder_spec *vp8bc_literal_spec(
const unsigned short prob_map[256], // 0 is like vp8 w/more precision
vp8bc_c_prec*
);
bool_coder_spec *vp8bc_float_spec(
unsigned int exponent_bits, unsigned int mantissa_bits, vp8bc_c_prec*
);
bool_coder_spec *vp8bc_exponential_spec(unsigned int min_exp, vp8bc_c_prec *);
bool_coder_spec *vp8bc_spec_from_file(FILE *);
void vp8bc_destroy_spec(c_bool_coder_spec *);
void vp8bc_spec_to_file(c_bool_coder_spec *, FILE *);
/* Nearest index to supplied probability of zero, 0 <= prob <= 1. */
vp8bc_index_t vp8bc_index(c_bool_coder_spec *, double prob);
vp8bc_index_t vp8bc_index_from_counts(
c_bool_coder_spec *p, unsigned int zero_ct, unsigned int one_ct
);
/* In case you want to look */
double vp8bc_probability(c_bool_coder_spec *, vp8bc_index_t);
/* Opposite index */
vp8bc_index_t vp8bc_complement(c_bool_coder_spec *, vp8bc_index_t);
/* Cost in bits of encoding a zero at given probability, scaled by 2^20.
(assumes that an int holds at least 32 bits). */
unsigned int vp8bc_cost_zero(c_bool_coder_spec *, vp8bc_index_t);
unsigned int vp8bc_cost_one(c_bool_coder_spec *, vp8bc_index_t);
unsigned int vp8bc_cost_bit(c_bool_coder_spec *, vp8bc_index_t, int);
/* bool_writer interface */
/* Length = 0 disables checking for writes beyond buffer end. */
bool_writer *vp8bc_create_writer(
c_bool_coder_spec *, unsigned char *Destination, size_t Length
);
/* Flushes out any buffered data and returns total # of bytes written. */
size_t vp8bc_destroy_writer(bool_writer *);
void vp8bc_write_bool(bool_writer *, int boolean_val, vp8bc_index_t false_prob);
void vp8bc_write_bits(
bool_writer *, unsigned int integer_value, int number_of_bits
);
c_bool_coder_spec *vp8bc_writer_spec(c_bool_writer *);
/* bool_reader interface */
/* Length = 0 disables checking for reads beyond buffer end. */
bool_reader *vp8bc_create_reader(
c_bool_coder_spec *, const unsigned char *Source, size_t Length
);
void vp8bc_destroy_reader(bool_reader *);
int vp8bc_read_bool(bool_reader *, vp8bc_index_t false_prob);
unsigned int vp8bc_read_bits(bool_reader *, int number_of_bits);
c_bool_coder_spec *vp8bc_reader_spec(c_bool_reader *);
#if __cplusplus
}
#endif
#endif /* bool_coder_h */

View File

@@ -1,93 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef CODEC_COMMON_INTERFACE_H
#define CODEC_COMMON_INTERFACE_H
#define __export
#define _export
#define dll_export __declspec( dllexport )
#define dll_import __declspec( dllimport )
// Playback ERROR Codes.
#define NO_DECODER_ERROR 0
#define REMOTE_DECODER_ERROR -1
#define DFR_BAD_DCT_COEFF -100
#define DFR_ZERO_LENGTH_FRAME -101
#define DFR_FRAME_SIZE_INVALID -102
#define DFR_OUTPUT_BUFFER_OVERFLOW -103
#define DFR_INVALID_FRAME_HEADER -104
#define FR_INVALID_MODE_TOKEN -110
#define ETR_ALLOCATION_ERROR -200
#define ETR_INVALID_ROOT_PTR -201
#define SYNCH_ERROR -400
#define BUFFER_UNDERFLOW_ERROR -500
#define PB_IB_OVERFLOW_ERROR -501
// External error triggers
#define PB_HEADER_CHECKSUM_ERROR -601
#define PB_DATA_CHECKSUM_ERROR -602
// DCT Error Codes
#define DDCT_EXPANSION_ERROR -700
#define DDCT_INVALID_TOKEN_ERROR -701
// exception_errors
#define GEN_EXCEPTIONS -800
#define EX_UNQUAL_ERROR -801
// Unrecoverable error codes
#define FATAL_PLAYBACK_ERROR -1000
#define GEN_ERROR_CREATING_CDC -1001
#define GEN_THREAD_CREATION_ERROR -1002
#define DFR_CREATE_BMP_FAILED -1003
// YUV buffer configuration structure
typedef struct
{
int y_width;
int y_height;
int y_stride;
int uv_width;
int uv_height;
int uv_stride;
unsigned char *y_buffer;
unsigned char *u_buffer;
unsigned char *v_buffer;
} YUV_BUFFER_CONFIG;
typedef enum
{
C_SET_KEY_FRAME,
C_SET_FIXED_Q,
C_SET_FIRSTPASS_FILE,
C_SET_EXPERIMENTAL_MIN,
C_SET_EXPERIMENTAL_MAX = C_SET_EXPERIMENTAL_MIN + 255,
C_SET_CHECKPROTECT,
C_SET_TESTMODE,
C_SET_INTERNAL_SIZE,
C_SET_RECOVERY_FRAME,
C_SET_REFERENCEFRAME,
C_SET_GOLDENFRAME
#ifndef VP50_COMP_INTERFACE
// Specialist test facilities.
// C_VCAP_PARAMS, // DO NOT USE FOR NOW WITH VFW CODEC
#endif
} C_SETTING;
typedef unsigned long C_SET_VALUE;
#endif

View File

@@ -12,7 +12,7 @@
/* Update probabilities for the nodes in the token entropy tree.
Generated file included by entropy.c */
const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens-1] =
const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [ENTROPY_NODES] =
{
{
{

View File

@@ -97,7 +97,7 @@ void vp8_print_modes_and_motion_vectors(MODE_INFO *mi, int rows, int cols, int f
bindex = (b_row & 3) * 4 + (b_col & 3);
if (mi[mb_index].mbmi.mode == B_PRED)
fprintf(mvs, "%2d ", mi[mb_index].bmi[bindex].mode);
fprintf(mvs, "%2d ", mi[mb_index].bmi[bindex].as_mode);
else
fprintf(mvs, "xx ");

View File

@@ -0,0 +1,225 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#include "defaultcoefcounts.h"
/* Generated file, included by entropy.c */
const unsigned int vp8_default_coef_counts[BLOCK_TYPES]
[COEF_BANDS]
[PREV_COEF_CONTEXTS]
[MAX_ENTROPY_TOKENS] =
{
{
/* Block Type ( 0 ) */
{
/* Coeff Band ( 0 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
{
/* Coeff Band ( 1 ) */
{30190, 26544, 225, 24, 4, 0, 0, 0, 0, 0, 0, 4171593,},
{26846, 25157, 1241, 130, 26, 6, 1, 0, 0, 0, 0, 149987,},
{10484, 9538, 1006, 160, 36, 18, 0, 0, 0, 0, 0, 15104,},
},
{
/* Coeff Band ( 2 ) */
{25842, 40456, 1126, 83, 11, 2, 0, 0, 0, 0, 0, 0,},
{9338, 8010, 512, 73, 7, 3, 2, 0, 0, 0, 0, 43294,},
{1047, 751, 149, 31, 13, 6, 1, 0, 0, 0, 0, 879,},
},
{
/* Coeff Band ( 3 ) */
{26136, 9826, 252, 13, 0, 0, 0, 0, 0, 0, 0, 0,},
{8134, 5574, 191, 14, 2, 0, 0, 0, 0, 0, 0, 35302,},
{ 605, 677, 116, 9, 1, 0, 0, 0, 0, 0, 0, 611,},
},
{
/* Coeff Band ( 4 ) */
{10263, 15463, 283, 17, 0, 0, 0, 0, 0, 0, 0, 0,},
{2773, 2191, 128, 9, 2, 2, 0, 0, 0, 0, 0, 10073,},
{ 134, 125, 32, 4, 0, 2, 0, 0, 0, 0, 0, 50,},
},
{
/* Coeff Band ( 5 ) */
{10483, 2663, 23, 1, 0, 0, 0, 0, 0, 0, 0, 0,},
{2137, 1251, 27, 1, 1, 0, 0, 0, 0, 0, 0, 14362,},
{ 116, 156, 14, 2, 1, 0, 0, 0, 0, 0, 0, 190,},
},
{
/* Coeff Band ( 6 ) */
{40977, 27614, 412, 28, 0, 0, 0, 0, 0, 0, 0, 0,},
{6113, 5213, 261, 22, 3, 0, 0, 0, 0, 0, 0, 26164,},
{ 382, 312, 50, 14, 2, 0, 0, 0, 0, 0, 0, 345,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 319,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8,},
},
},
{
/* Block Type ( 1 ) */
{
/* Coeff Band ( 0 ) */
{3268, 19382, 1043, 250, 93, 82, 49, 26, 17, 8, 25, 82289,},
{8758, 32110, 5436, 1832, 827, 668, 420, 153, 24, 0, 3, 52914,},
{9337, 23725, 8487, 3954, 2107, 1836, 1069, 399, 59, 0, 0, 18620,},
},
{
/* Coeff Band ( 1 ) */
{12419, 8420, 452, 62, 9, 1, 0, 0, 0, 0, 0, 0,},
{11715, 8705, 693, 92, 15, 7, 2, 0, 0, 0, 0, 53988,},
{7603, 8585, 2306, 778, 270, 145, 39, 5, 0, 0, 0, 9136,},
},
{
/* Coeff Band ( 2 ) */
{15938, 14335, 1207, 184, 55, 13, 4, 1, 0, 0, 0, 0,},
{7415, 6829, 1138, 244, 71, 26, 7, 0, 0, 0, 0, 9980,},
{1580, 1824, 655, 241, 89, 46, 10, 2, 0, 0, 0, 429,},
},
{
/* Coeff Band ( 3 ) */
{19453, 5260, 201, 19, 0, 0, 0, 0, 0, 0, 0, 0,},
{9173, 3758, 213, 22, 1, 1, 0, 0, 0, 0, 0, 9820,},
{1689, 1277, 276, 51, 17, 4, 0, 0, 0, 0, 0, 679,},
},
{
/* Coeff Band ( 4 ) */
{12076, 10667, 620, 85, 19, 9, 5, 0, 0, 0, 0, 0,},
{4665, 3625, 423, 55, 19, 9, 0, 0, 0, 0, 0, 5127,},
{ 415, 440, 143, 34, 20, 7, 2, 0, 0, 0, 0, 101,},
},
{
/* Coeff Band ( 5 ) */
{12183, 4846, 115, 11, 1, 0, 0, 0, 0, 0, 0, 0,},
{4226, 3149, 177, 21, 2, 0, 0, 0, 0, 0, 0, 7157,},
{ 375, 621, 189, 51, 11, 4, 1, 0, 0, 0, 0, 198,},
},
{
/* Coeff Band ( 6 ) */
{61658, 37743, 1203, 94, 10, 3, 0, 0, 0, 0, 0, 0,},
{15514, 11563, 903, 111, 14, 5, 0, 0, 0, 0, 0, 25195,},
{ 929, 1077, 291, 78, 14, 7, 1, 0, 0, 0, 0, 507,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 990, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 412, 13, 0, 0, 0, 0, 0, 0, 0, 0, 1641,},
{ 0, 18, 7, 1, 0, 0, 0, 0, 0, 0, 0, 30,},
},
},
{
/* Block Type ( 2 ) */
{
/* Coeff Band ( 0 ) */
{ 953, 24519, 628, 120, 28, 12, 4, 0, 0, 0, 0, 2248798,},
{1525, 25654, 2647, 617, 239, 143, 42, 5, 0, 0, 0, 66837,},
{1180, 11011, 3001, 1237, 532, 448, 239, 54, 5, 0, 0, 7122,},
},
{
/* Coeff Band ( 1 ) */
{1356, 2220, 67, 10, 4, 1, 0, 0, 0, 0, 0, 0,},
{1450, 2544, 102, 18, 4, 3, 0, 0, 0, 0, 0, 57063,},
{1182, 2110, 470, 130, 41, 21, 0, 0, 0, 0, 0, 6047,},
},
{
/* Coeff Band ( 2 ) */
{ 370, 3378, 200, 30, 5, 4, 1, 0, 0, 0, 0, 0,},
{ 293, 1006, 131, 29, 11, 0, 0, 0, 0, 0, 0, 5404,},
{ 114, 387, 98, 23, 4, 8, 1, 0, 0, 0, 0, 236,},
},
{
/* Coeff Band ( 3 ) */
{ 579, 194, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 395, 213, 5, 1, 0, 0, 0, 0, 0, 0, 0, 4157,},
{ 119, 122, 4, 0, 0, 0, 0, 0, 0, 0, 0, 300,},
},
{
/* Coeff Band ( 4 ) */
{ 38, 557, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 21, 114, 12, 1, 0, 0, 0, 0, 0, 0, 0, 427,},
{ 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7,},
},
{
/* Coeff Band ( 5 ) */
{ 52, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 18, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 652,},
{ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30,},
},
{
/* Coeff Band ( 6 ) */
{ 640, 569, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 25, 77, 2, 0, 0, 0, 0, 0, 0, 0, 0, 517,},
{ 4, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
},
{
/* Block Type ( 3 ) */
{
/* Coeff Band ( 0 ) */
{2506, 20161, 2707, 767, 261, 178, 107, 30, 14, 3, 0, 100694,},
{8806, 36478, 8817, 3268, 1280, 850, 401, 114, 42, 0, 0, 58572,},
{11003, 27214, 11798, 5716, 2482, 2072, 1048, 175, 32, 0, 0, 19284,},
},
{
/* Coeff Band ( 1 ) */
{9738, 11313, 959, 205, 70, 18, 11, 1, 0, 0, 0, 0,},
{12628, 15085, 1507, 273, 52, 19, 9, 0, 0, 0, 0, 54280,},
{10701, 15846, 5561, 1926, 813, 570, 249, 36, 0, 0, 0, 6460,},
},
{
/* Coeff Band ( 2 ) */
{6781, 22539, 2784, 634, 182, 123, 20, 4, 0, 0, 0, 0,},
{6263, 11544, 2649, 790, 259, 168, 27, 5, 0, 0, 0, 20539,},
{3109, 4075, 2031, 896, 457, 386, 158, 29, 0, 0, 0, 1138,},
},
{
/* Coeff Band ( 3 ) */
{11515, 4079, 465, 73, 5, 14, 2, 0, 0, 0, 0, 0,},
{9361, 5834, 650, 96, 24, 8, 4, 0, 0, 0, 0, 22181,},
{4343, 3974, 1360, 415, 132, 96, 14, 1, 0, 0, 0, 1267,},
},
{
/* Coeff Band ( 4 ) */
{4787, 9297, 823, 168, 44, 12, 4, 0, 0, 0, 0, 0,},
{3619, 4472, 719, 198, 60, 31, 3, 0, 0, 0, 0, 8401,},
{1157, 1175, 483, 182, 88, 31, 8, 0, 0, 0, 0, 268,},
},
{
/* Coeff Band ( 5 ) */
{8299, 1226, 32, 5, 1, 0, 0, 0, 0, 0, 0, 0,},
{3502, 1568, 57, 4, 1, 1, 0, 0, 0, 0, 0, 9811,},
{1055, 1070, 166, 29, 6, 1, 0, 0, 0, 0, 0, 527,},
},
{
/* Coeff Band ( 6 ) */
{27414, 27927, 1989, 347, 69, 26, 0, 0, 0, 0, 0, 0,},
{5876, 10074, 1574, 341, 91, 24, 4, 0, 0, 0, 0, 21954,},
{1571, 2171, 778, 324, 124, 65, 16, 0, 0, 0, 0, 979,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 459,},
{ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13,},
},
},
};

View File

@@ -8,214 +8,14 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef __DEFAULTCOEFCOUNTS_H
#define __DEFAULTCOEFCOUNTS_H
/* Generated file, included by entropy.c */
#include "entropy.h"
static const unsigned int default_coef_counts [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens] =
{
extern const unsigned int vp8_default_coef_counts[BLOCK_TYPES]
[COEF_BANDS]
[PREV_COEF_CONTEXTS]
[MAX_ENTROPY_TOKENS];
{
/* Block Type ( 0 ) */
{
/* Coeff Band ( 0 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
{
/* Coeff Band ( 1 ) */
{30190, 26544, 225, 24, 4, 0, 0, 0, 0, 0, 0, 4171593,},
{26846, 25157, 1241, 130, 26, 6, 1, 0, 0, 0, 0, 149987,},
{10484, 9538, 1006, 160, 36, 18, 0, 0, 0, 0, 0, 15104,},
},
{
/* Coeff Band ( 2 ) */
{25842, 40456, 1126, 83, 11, 2, 0, 0, 0, 0, 0, 0,},
{9338, 8010, 512, 73, 7, 3, 2, 0, 0, 0, 0, 43294,},
{1047, 751, 149, 31, 13, 6, 1, 0, 0, 0, 0, 879,},
},
{
/* Coeff Band ( 3 ) */
{26136, 9826, 252, 13, 0, 0, 0, 0, 0, 0, 0, 0,},
{8134, 5574, 191, 14, 2, 0, 0, 0, 0, 0, 0, 35302,},
{ 605, 677, 116, 9, 1, 0, 0, 0, 0, 0, 0, 611,},
},
{
/* Coeff Band ( 4 ) */
{10263, 15463, 283, 17, 0, 0, 0, 0, 0, 0, 0, 0,},
{2773, 2191, 128, 9, 2, 2, 0, 0, 0, 0, 0, 10073,},
{ 134, 125, 32, 4, 0, 2, 0, 0, 0, 0, 0, 50,},
},
{
/* Coeff Band ( 5 ) */
{10483, 2663, 23, 1, 0, 0, 0, 0, 0, 0, 0, 0,},
{2137, 1251, 27, 1, 1, 0, 0, 0, 0, 0, 0, 14362,},
{ 116, 156, 14, 2, 1, 0, 0, 0, 0, 0, 0, 190,},
},
{
/* Coeff Band ( 6 ) */
{40977, 27614, 412, 28, 0, 0, 0, 0, 0, 0, 0, 0,},
{6113, 5213, 261, 22, 3, 0, 0, 0, 0, 0, 0, 26164,},
{ 382, 312, 50, 14, 2, 0, 0, 0, 0, 0, 0, 345,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 319,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8,},
},
},
{
/* Block Type ( 1 ) */
{
/* Coeff Band ( 0 ) */
{3268, 19382, 1043, 250, 93, 82, 49, 26, 17, 8, 25, 82289,},
{8758, 32110, 5436, 1832, 827, 668, 420, 153, 24, 0, 3, 52914,},
{9337, 23725, 8487, 3954, 2107, 1836, 1069, 399, 59, 0, 0, 18620,},
},
{
/* Coeff Band ( 1 ) */
{12419, 8420, 452, 62, 9, 1, 0, 0, 0, 0, 0, 0,},
{11715, 8705, 693, 92, 15, 7, 2, 0, 0, 0, 0, 53988,},
{7603, 8585, 2306, 778, 270, 145, 39, 5, 0, 0, 0, 9136,},
},
{
/* Coeff Band ( 2 ) */
{15938, 14335, 1207, 184, 55, 13, 4, 1, 0, 0, 0, 0,},
{7415, 6829, 1138, 244, 71, 26, 7, 0, 0, 0, 0, 9980,},
{1580, 1824, 655, 241, 89, 46, 10, 2, 0, 0, 0, 429,},
},
{
/* Coeff Band ( 3 ) */
{19453, 5260, 201, 19, 0, 0, 0, 0, 0, 0, 0, 0,},
{9173, 3758, 213, 22, 1, 1, 0, 0, 0, 0, 0, 9820,},
{1689, 1277, 276, 51, 17, 4, 0, 0, 0, 0, 0, 679,},
},
{
/* Coeff Band ( 4 ) */
{12076, 10667, 620, 85, 19, 9, 5, 0, 0, 0, 0, 0,},
{4665, 3625, 423, 55, 19, 9, 0, 0, 0, 0, 0, 5127,},
{ 415, 440, 143, 34, 20, 7, 2, 0, 0, 0, 0, 101,},
},
{
/* Coeff Band ( 5 ) */
{12183, 4846, 115, 11, 1, 0, 0, 0, 0, 0, 0, 0,},
{4226, 3149, 177, 21, 2, 0, 0, 0, 0, 0, 0, 7157,},
{ 375, 621, 189, 51, 11, 4, 1, 0, 0, 0, 0, 198,},
},
{
/* Coeff Band ( 6 ) */
{61658, 37743, 1203, 94, 10, 3, 0, 0, 0, 0, 0, 0,},
{15514, 11563, 903, 111, 14, 5, 0, 0, 0, 0, 0, 25195,},
{ 929, 1077, 291, 78, 14, 7, 1, 0, 0, 0, 0, 507,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 990, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 412, 13, 0, 0, 0, 0, 0, 0, 0, 0, 1641,},
{ 0, 18, 7, 1, 0, 0, 0, 0, 0, 0, 0, 30,},
},
},
{
/* Block Type ( 2 ) */
{
/* Coeff Band ( 0 ) */
{ 953, 24519, 628, 120, 28, 12, 4, 0, 0, 0, 0, 2248798,},
{1525, 25654, 2647, 617, 239, 143, 42, 5, 0, 0, 0, 66837,},
{1180, 11011, 3001, 1237, 532, 448, 239, 54, 5, 0, 0, 7122,},
},
{
/* Coeff Band ( 1 ) */
{1356, 2220, 67, 10, 4, 1, 0, 0, 0, 0, 0, 0,},
{1450, 2544, 102, 18, 4, 3, 0, 0, 0, 0, 0, 57063,},
{1182, 2110, 470, 130, 41, 21, 0, 0, 0, 0, 0, 6047,},
},
{
/* Coeff Band ( 2 ) */
{ 370, 3378, 200, 30, 5, 4, 1, 0, 0, 0, 0, 0,},
{ 293, 1006, 131, 29, 11, 0, 0, 0, 0, 0, 0, 5404,},
{ 114, 387, 98, 23, 4, 8, 1, 0, 0, 0, 0, 236,},
},
{
/* Coeff Band ( 3 ) */
{ 579, 194, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 395, 213, 5, 1, 0, 0, 0, 0, 0, 0, 0, 4157,},
{ 119, 122, 4, 0, 0, 0, 0, 0, 0, 0, 0, 300,},
},
{
/* Coeff Band ( 4 ) */
{ 38, 557, 19, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 21, 114, 12, 1, 0, 0, 0, 0, 0, 0, 0, 427,},
{ 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7,},
},
{
/* Coeff Band ( 5 ) */
{ 52, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 18, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 652,},
{ 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30,},
},
{
/* Coeff Band ( 6 ) */
{ 640, 569, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 25, 77, 2, 0, 0, 0, 0, 0, 0, 0, 0, 517,},
{ 4, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
},
},
{
/* Block Type ( 3 ) */
{
/* Coeff Band ( 0 ) */
{2506, 20161, 2707, 767, 261, 178, 107, 30, 14, 3, 0, 100694,},
{8806, 36478, 8817, 3268, 1280, 850, 401, 114, 42, 0, 0, 58572,},
{11003, 27214, 11798, 5716, 2482, 2072, 1048, 175, 32, 0, 0, 19284,},
},
{
/* Coeff Band ( 1 ) */
{9738, 11313, 959, 205, 70, 18, 11, 1, 0, 0, 0, 0,},
{12628, 15085, 1507, 273, 52, 19, 9, 0, 0, 0, 0, 54280,},
{10701, 15846, 5561, 1926, 813, 570, 249, 36, 0, 0, 0, 6460,},
},
{
/* Coeff Band ( 2 ) */
{6781, 22539, 2784, 634, 182, 123, 20, 4, 0, 0, 0, 0,},
{6263, 11544, 2649, 790, 259, 168, 27, 5, 0, 0, 0, 20539,},
{3109, 4075, 2031, 896, 457, 386, 158, 29, 0, 0, 0, 1138,},
},
{
/* Coeff Band ( 3 ) */
{11515, 4079, 465, 73, 5, 14, 2, 0, 0, 0, 0, 0,},
{9361, 5834, 650, 96, 24, 8, 4, 0, 0, 0, 0, 22181,},
{4343, 3974, 1360, 415, 132, 96, 14, 1, 0, 0, 0, 1267,},
},
{
/* Coeff Band ( 4 ) */
{4787, 9297, 823, 168, 44, 12, 4, 0, 0, 0, 0, 0,},
{3619, 4472, 719, 198, 60, 31, 3, 0, 0, 0, 0, 8401,},
{1157, 1175, 483, 182, 88, 31, 8, 0, 0, 0, 0, 268,},
},
{
/* Coeff Band ( 5 ) */
{8299, 1226, 32, 5, 1, 0, 0, 0, 0, 0, 0, 0,},
{3502, 1568, 57, 4, 1, 1, 0, 0, 0, 0, 0, 9811,},
{1055, 1070, 166, 29, 6, 1, 0, 0, 0, 0, 0, 527,},
},
{
/* Coeff Band ( 6 ) */
{27414, 27927, 1989, 347, 69, 26, 0, 0, 0, 0, 0, 0,},
{5876, 10074, 1574, 341, 91, 24, 4, 0, 0, 0, 0, 21954,},
{1571, 2171, 778, 324, 124, 65, 16, 0, 0, 0, 0, 979,},
},
{
/* Coeff Band ( 7 ) */
{ 0, 29, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,},
{ 0, 23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 459,},
{ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 13,},
},
},
};
#endif //__DEFAULTCOEFCOUNTS_H

View File

@@ -26,8 +26,32 @@ typedef vp8_prob Prob;
#include "coefupdateprobs.h"
DECLARE_ALIGNED(16, cuchar, vp8_coef_bands[16]) = { 0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7};
DECLARE_ALIGNED(16, cuchar, vp8_prev_token_class[MAX_ENTROPY_TOKENS]) = { 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0};
DECLARE_ALIGNED(16, const unsigned char, vp8_norm[256]) =
{
0, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
DECLARE_ALIGNED(16, cuchar, vp8_coef_bands[16]) =
{ 0, 1, 2, 3, 6, 4, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7};
DECLARE_ALIGNED(16, cuchar, vp8_prev_token_class[MAX_ENTROPY_TOKENS]) =
{ 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0};
DECLARE_ALIGNED(16, const int, vp8_default_zig_zag1d[16]) =
{
0, 1, 4, 8,
@@ -36,6 +60,14 @@ DECLARE_ALIGNED(16, const int, vp8_default_zig_zag1d[16]) =
7, 11, 14, 15,
};
DECLARE_ALIGNED(16, const short, vp8_default_inv_zig_zag[16]) =
{
1, 2, 6, 7,
3, 5, 8, 13,
4, 9, 12, 14,
10, 11, 15, 16
};
DECLARE_ALIGNED(16, short, vp8_default_zig_zag_mask[16]);
const int vp8_mb_feature_data_bits[MB_LVL_MAX] = {7, 6};
@@ -57,7 +89,7 @@ const vp8_tree_index vp8_coef_tree[ 22] = /* corresponding _CONTEXT_NODEs */
-DCT_VAL_CATEGORY5, -DCT_VAL_CATEGORY6 /* 10 = CAT_FIVE */
};
struct vp8_token_struct vp8_coef_encodings[vp8_coef_tokens];
struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];
/* Trees for extra bits. Probabilities are constant and
do not depend on previously encoded bits */
@@ -106,23 +138,20 @@ static void init_bit_trees()
init_bit_tree(cat6, 11);
}
static vp8bc_index_t bcc1[1], bcc2[2], bcc3[3], bcc4[4], bcc5[5], bcc6[11];
vp8_extra_bit_struct vp8_extra_bits[12] =
{
{ 0, 0, 0, 0, 0},
{ 0, 0, 0, 0, 1},
{ 0, 0, 0, 0, 2},
{ 0, 0, 0, 0, 3},
{ 0, 0, 0, 0, 4},
{ cat1, Pcat1, bcc1, 1, 5},
{ cat2, Pcat2, bcc2, 2, 7},
{ cat3, Pcat3, bcc3, 3, 11},
{ cat4, Pcat4, bcc4, 4, 19},
{ cat5, Pcat5, bcc5, 5, 35},
{ cat6, Pcat6, bcc6, 11, 67},
{ 0, 0, 0, 0, 0}
{ 0, 0, 0, 0},
{ 0, 0, 0, 1},
{ 0, 0, 0, 2},
{ 0, 0, 0, 3},
{ 0, 0, 0, 4},
{ cat1, Pcat1, 1, 5},
{ cat2, Pcat2, 2, 7},
{ cat3, Pcat3, 3, 11},
{ cat4, Pcat4, 4, 19},
{ cat5, Pcat5, 5, 35},
{ cat6, Pcat6, 11, 67},
{ 0, 0, 0, 0}
};
#include "defaultcoefcounts.h"
@@ -140,10 +169,12 @@ void vp8_default_coef_probs(VP8_COMMON *pc)
do
{
unsigned int branch_ct [vp8_coef_tokens-1] [2];
unsigned int branch_ct [ENTROPY_NODES] [2];
vp8_tree_probs_from_distribution(
vp8_coef_tokens, vp8_coef_encodings, vp8_coef_tree,
pc->fc.coef_probs [h][i][k], branch_ct, default_coef_counts [h][i][k],
MAX_ENTROPY_TOKENS, vp8_coef_encodings, vp8_coef_tree,
pc->fc.coef_probs[h][i][k],
branch_ct,
vp8_default_coef_counts[h][i][k],
256, 1);
}

View File

@@ -24,25 +24,23 @@
#define FOUR_TOKEN 4 /* 4 Extra Bits 0+1 */
#define DCT_VAL_CATEGORY1 5 /* 5-6 Extra Bits 1+1 */
#define DCT_VAL_CATEGORY2 6 /* 7-10 Extra Bits 2+1 */
#define DCT_VAL_CATEGORY3 7 /* 11-26 Extra Bits 4+1 */
#define DCT_VAL_CATEGORY4 8 /* 11-26 Extra Bits 5+1 */
#define DCT_VAL_CATEGORY5 9 /* 27-58 Extra Bits 5+1 */
#define DCT_VAL_CATEGORY6 10 /* 59+ Extra Bits 11+1 */
#define DCT_VAL_CATEGORY3 7 /* 11-18 Extra Bits 3+1 */
#define DCT_VAL_CATEGORY4 8 /* 19-34 Extra Bits 4+1 */
#define DCT_VAL_CATEGORY5 9 /* 35-66 Extra Bits 5+1 */
#define DCT_VAL_CATEGORY6 10 /* 67+ Extra Bits 11+1 */
#define DCT_EOB_TOKEN 11 /* EOB Extra Bits 0+0 */
#define vp8_coef_tokens 12
#define MAX_ENTROPY_TOKENS vp8_coef_tokens
#define MAX_ENTROPY_TOKENS 12
#define ENTROPY_NODES 11
extern const vp8_tree_index vp8_coef_tree[];
extern struct vp8_token_struct vp8_coef_encodings[vp8_coef_tokens];
extern struct vp8_token_struct vp8_coef_encodings[MAX_ENTROPY_TOKENS];
typedef struct
{
vp8_tree_p tree;
const vp8_prob *prob;
vp8bc_index_t *prob_bc;
int Len;
int base_val;
} vp8_extra_bit_struct;
@@ -86,15 +84,16 @@ extern DECLARE_ALIGNED(16, const unsigned char, vp8_coef_bands[16]);
/*# define DC_TOKEN_CONTEXTS 3*/ /* 00, 0!0, !0!0 */
# define PREV_COEF_CONTEXTS 3
extern DECLARE_ALIGNED(16, const unsigned char, vp8_prev_token_class[vp8_coef_tokens]);
extern DECLARE_ALIGNED(16, const unsigned char, vp8_prev_token_class[MAX_ENTROPY_TOKENS]);
extern const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [vp8_coef_tokens-1];
extern const vp8_prob vp8_coef_update_probs [BLOCK_TYPES] [COEF_BANDS] [PREV_COEF_CONTEXTS] [ENTROPY_NODES];
struct VP8Common;
void vp8_default_coef_probs(struct VP8Common *);
extern DECLARE_ALIGNED(16, const int, vp8_default_zig_zag1d[16]);
extern DECLARE_ALIGNED(16, const short, vp8_default_inv_zig_zag[16]);
extern short vp8_default_zig_zag_mask[16];
extern const int vp8_mb_feature_data_bits[MB_LVL_MAX];

View File

@@ -33,11 +33,11 @@ typedef enum
SUBMVREF_LEFT_ABOVE_ZED
} sumvfref_t;
int vp8_mv_cont(const MV *l, const MV *a)
int vp8_mv_cont(const int_mv *l, const int_mv *a)
{
int lez = (l->row == 0 && l->col == 0);
int aez = (a->row == 0 && a->col == 0);
int lea = (l->row == a->row && l->col == a->col);
int lez = (l->as_int == 0);
int aez = (a->as_int == 0);
int lea = (l->as_int == a->as_int);
if (lea && lez)
return SUBMVREF_LEFT_ABOVE_ZED;

View File

@@ -25,7 +25,7 @@ extern const int vp8_mbsplit_count [VP8_NUMMBSPLITS]; /* # of subsets */
extern const vp8_prob vp8_mbsplit_probs [VP8_NUMMBSPLITS-1];
extern int vp8_mv_cont(const MV *l, const MV *a);
extern int vp8_mv_cont(const int_mv *l, const int_mv *a);
#define SUBMVREF_COUNT 5
extern const vp8_prob vp8_sub_mv_ref_prob2 [SUBMVREF_COUNT][VP8_SUBMVREFS-1];

View File

@@ -18,6 +18,8 @@ enum
{
mv_max = 1023, /* max absolute value of a MV component */
MVvals = (2 * mv_max) + 1, /* # possible values "" */
mvfp_max = 255, /* max absolute value of a full pixel MV component */
MVfpvals = (2 * mvfp_max) +1, /* # possible full pixel MV values */
mvlong_width = 10, /* Large MVs have 9 bit magnitudes */
mvnum_short = 8, /* magnitudes 0 through 7 */

View File

@@ -13,10 +13,12 @@
#include "vpx_mem/vpx_mem.h"
static void extend_plane_borders
static void copy_and_extend_plane
(
unsigned char *s, /* source */
int sp, /* pitch */
int sp, /* source pitch */
unsigned char *d, /* destination */
int dp, /* destination pitch */
int h, /* height */
int w, /* width */
int et, /* extend top border */
@@ -25,7 +27,6 @@ static void extend_plane_borders
int er /* extend right border */
)
{
int i;
unsigned char *src_ptr1, *src_ptr2;
unsigned char *dest_ptr1, *dest_ptr2;
@@ -34,68 +35,73 @@ static void extend_plane_borders
/* copy the left and right most columns out */
src_ptr1 = s;
src_ptr2 = s + w - 1;
dest_ptr1 = s - el;
dest_ptr2 = s + w;
dest_ptr1 = d - el;
dest_ptr2 = d + w;
for (i = 0; i < h - 0 + 1; i++)
for (i = 0; i < h; i++)
{
/* Some linkers will complain if we call vpx_memset with el set to a
* constant 0.
*/
if (el)
vpx_memset(dest_ptr1, src_ptr1[0], el);
vpx_memcpy(dest_ptr1 + el, src_ptr1, w);
vpx_memset(dest_ptr2, src_ptr2[0], er);
src_ptr1 += sp;
src_ptr2 += sp;
dest_ptr1 += sp;
dest_ptr2 += sp;
dest_ptr1 += dp;
dest_ptr2 += dp;
}
/* Now copy the top and bottom source lines into each line of the respective borders */
src_ptr1 = s - el;
src_ptr2 = s + sp * (h - 1) - el;
dest_ptr1 = s + sp * (-et) - el;
dest_ptr2 = s + sp * (h) - el;
linesize = el + er + w + 1;
/* Now copy the top and bottom lines into each line of the respective
* borders
*/
src_ptr1 = d - el;
src_ptr2 = d + dp * (h - 1) - el;
dest_ptr1 = d + dp * (-et) - el;
dest_ptr2 = d + dp * (h) - el;
linesize = el + er + w;
for (i = 0; i < (int)et; i++)
for (i = 0; i < et; i++)
{
vpx_memcpy(dest_ptr1, src_ptr1, linesize);
dest_ptr1 += sp;
dest_ptr1 += dp;
}
for (i = 0; i < (int)eb; i++)
for (i = 0; i < eb; i++)
{
vpx_memcpy(dest_ptr2, src_ptr2, linesize);
dest_ptr2 += sp;
dest_ptr2 += dp;
}
}
void vp8_extend_to_multiple_of16(YV12_BUFFER_CONFIG *ybf, int width, int height)
void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
YV12_BUFFER_CONFIG *dst)
{
int er = 0xf & (16 - (width & 0xf));
int eb = 0xf & (16 - (height & 0xf));
int et = dst->border;
int el = dst->border;
int eb = dst->border + dst->y_height - src->y_height;
int er = dst->border + dst->y_width - src->y_width;
/* check for non multiples of 16 */
if (er != 0 || eb != 0)
{
extend_plane_borders(ybf->y_buffer, ybf->y_stride, height, width, 0, 0, eb, er);
copy_and_extend_plane(src->y_buffer, src->y_stride,
dst->y_buffer, dst->y_stride,
src->y_height, src->y_width,
et, el, eb, er);
/* adjust for uv */
height = (height + 1) >> 1;
width = (width + 1) >> 1;
er = 0x7 & (8 - (width & 0x7));
eb = 0x7 & (8 - (height & 0x7));
et = dst->border >> 1;
el = dst->border >> 1;
eb = (dst->border >> 1) + dst->uv_height - src->uv_height;
er = (dst->border >> 1) + dst->uv_width - src->uv_width;
if (er || eb)
{
extend_plane_borders(ybf->u_buffer, ybf->uv_stride, height, width, 0, 0, eb, er);
extend_plane_borders(ybf->v_buffer, ybf->uv_stride, height, width, 0, 0, eb, er);
}
}
copy_and_extend_plane(src->u_buffer, src->uv_stride,
dst->u_buffer, dst->uv_stride,
src->uv_height, src->uv_width,
et, el, eb, er);
copy_and_extend_plane(src->v_buffer, src->uv_stride,
dst->v_buffer, dst->uv_stride,
src->uv_height, src->uv_width,
et, el, eb, er);
}
/* note the extension is only for the last row, for intra prediction purpose */
void vp8_extend_mb_row(YV12_BUFFER_CONFIG *ybf, unsigned char *YPtr, unsigned char *UPtr, unsigned char *VPtr)
{

View File

@@ -14,8 +14,8 @@
#include "vpx_scale/yv12config.h"
void Extend(YV12_BUFFER_CONFIG *ybf);
void vp8_extend_mb_row(YV12_BUFFER_CONFIG *ybf, unsigned char *YPtr, unsigned char *UPtr, unsigned char *VPtr);
void vp8_extend_to_multiple_of16(YV12_BUFFER_CONFIG *ybf, int width, int height);
void vp8_copy_and_extend_frame(YV12_BUFFER_CONFIG *src,
YV12_BUFFER_CONFIG *dst);
#endif

View File

@@ -10,13 +10,10 @@
#include <stdlib.h>
#include "filter.h"
#include "vpx_ports/mem.h"
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
static const int bilinear_filters[8][2] =
DECLARE_ALIGNED(16, const short, vp8_bilinear_filters[8][2]) =
{
{ 128, 0 },
{ 112, 16 },
@@ -28,8 +25,7 @@ static const int bilinear_filters[8][2] =
{ 16, 112 }
};
static const short sub_pel_filters[8][6] =
DECLARE_ALIGNED(16, const short, vp8_sub_pel_filters[8][6]) =
{
{ 0, 0, 128, 0, 0, 0 }, /* note that 1/8 pel positions are just as per alpha -0.5 bicubic */
@@ -40,12 +36,9 @@ static const short sub_pel_filters[8][6] =
{ 0, -6, 50, 93, -9, 0 },
{ 1, -8, 36, 108, -11, 2 }, /* New 1/4 pel 6 tap filter */
{ 0, -1, 12, 123, -6, 0 },
};
void vp8_filter_block2d_first_pass
static void filter_block2d_first_pass
(
unsigned char *src_ptr,
int *output_ptr,
@@ -89,7 +82,7 @@ void vp8_filter_block2d_first_pass
}
}
void vp8_filter_block2d_second_pass
static void filter_block2d_second_pass
(
int *src_ptr,
unsigned char *output_ptr,
@@ -136,7 +129,7 @@ void vp8_filter_block2d_second_pass
}
void vp8_filter_block2d
static void filter_block2d
(
unsigned char *src_ptr,
unsigned char *output_ptr,
@@ -146,42 +139,16 @@ void vp8_filter_block2d
const short *VFilter
)
{
int FData[9*4]; /* Temp data bufffer used in filtering */
int FData[9*4]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 4, HFilter);
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 4, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 8, output_ptr, output_pitch, 4, 4, 4, 4, VFilter);
filter_block2d_second_pass(FData + 8, output_ptr, output_pitch, 4, 4, 4, 4, VFilter);
}
void vp8_block_variation_c
(
unsigned char *src_ptr,
int src_pixels_per_line,
int *HVar,
int *VVar
)
{
int i, j;
unsigned char *Ptr = src_ptr;
for (i = 0; i < 4; i++)
{
for (j = 0; j < 4; j++)
{
*HVar += abs((int)Ptr[j] - (int)Ptr[j+1]);
*VVar += abs((int)Ptr[j] - (int)Ptr[j+src_pixels_per_line]);
}
Ptr += src_pixels_per_line;
}
}
void vp8_sixtap_predict_c
(
unsigned char *src_ptr,
@@ -195,10 +162,10 @@ void vp8_sixtap_predict_c
const short *HFilter;
const short *VFilter;
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
vp8_filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
filter_block2d(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter);
}
void vp8_sixtap_predict8x8_c
(
@@ -212,17 +179,17 @@ void vp8_sixtap_predict8x8_c
{
const short *HFilter;
const short *VFilter;
int FData[13*16]; /* Temp data bufffer used in filtering */
int FData[13*16]; /* Temp data buffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 13, 8, HFilter);
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 13, 8, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 8, 8, VFilter);
filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 8, 8, VFilter);
}
@@ -238,17 +205,17 @@ void vp8_sixtap_predict8x4_c
{
const short *HFilter;
const short *VFilter;
int FData[13*16]; /* Temp data bufffer used in filtering */
int FData[13*16]; /* Temp data buffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 8, HFilter);
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 9, 8, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 4, 8, VFilter);
filter_block2d_second_pass(FData + 16, dst_ptr, dst_pitch, 8, 8, 4, 8, VFilter);
}
@@ -264,17 +231,17 @@ void vp8_sixtap_predict16x16_c
{
const short *HFilter;
const short *VFilter;
int FData[21*24]; /* Temp data bufffer used in filtering */
int FData[21*24]; /* Temp data buffer used in filtering */
HFilter = sub_pel_filters[xoffset]; /* 6 tap */
VFilter = sub_pel_filters[yoffset]; /* 6 tap */
HFilter = vp8_sub_pel_filters[xoffset]; /* 6 tap */
VFilter = vp8_sub_pel_filters[yoffset]; /* 6 tap */
/* First filter 1-D horizontally... */
vp8_filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 21, 16, HFilter);
filter_block2d_first_pass(src_ptr - (2 * src_pixels_per_line), FData, src_pixels_per_line, 1, 21, 16, HFilter);
/* then filter verticaly... */
vp8_filter_block2d_second_pass(FData + 32, dst_ptr, dst_pitch, 16, 16, 16, 16, VFilter);
filter_block2d_second_pass(FData + 32, dst_ptr, dst_pitch, 16, 16, 16, 16, VFilter);
}
@@ -284,56 +251,49 @@ void vp8_sixtap_predict16x16_c
* ROUTINE : filter_block2d_bil_first_pass
*
* INPUTS : UINT8 *src_ptr : Pointer to source block.
* UINT32 src_pixels_per_line : Stride of input block.
* UINT32 pixel_step : Offset between filter input samples (see notes).
* UINT32 output_height : Input block height.
* UINT32 output_width : Input block width.
* UINT32 src_stride : Stride of source block.
* UINT32 height : Block height.
* UINT32 width : Block width.
* INT32 *vp8_filter : Array of 2 bi-linear filter taps.
*
* OUTPUTS : INT32 *output_ptr : Pointer to filtered block.
* OUTPUTS : INT32 *dst_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block in
* either horizontal or vertical direction to produce the
* filtered output block. Used to implement first-pass
* of 2-D separable filter.
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block
* in the horizontal direction to produce the filtered output
* block. Used to implement first-pass of 2-D separable filter.
*
* SPECIAL NOTES : Produces INT32 output to retain precision for next pass.
* Two filter taps should sum to VP8_FILTER_WEIGHT.
* pixel_step defines whether the filter is applied
* horizontally (pixel_step=1) or vertically (pixel_step=stride).
* It defines the offset required to move from one input
* to the next.
*
****************************************************************************/
void vp8_filter_block2d_bil_first_pass
static void filter_block2d_bil_first_pass
(
unsigned char *src_ptr,
unsigned short *output_ptr,
unsigned int src_pixels_per_line,
int pixel_step,
unsigned int output_height,
unsigned int output_width,
const int *vp8_filter
unsigned short *dst_ptr,
unsigned int src_stride,
unsigned int height,
unsigned int width,
const short *vp8_filter
)
{
unsigned int i, j;
for (i = 0; i < output_height; i++)
for (i = 0; i < height; i++)
{
for (j = 0; j < output_width; j++)
for (j = 0; j < width; j++)
{
/* Apply bilinear filter */
output_ptr[j] = (((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[pixel_step] * vp8_filter[1]) +
dst_ptr[j] = (((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[1] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2)) >> VP8_FILTER_SHIFT;
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_width;
src_ptr += src_stride - width;
dst_ptr += width;
}
}
@@ -342,59 +302,50 @@ void vp8_filter_block2d_bil_first_pass
* ROUTINE : filter_block2d_bil_second_pass
*
* INPUTS : INT32 *src_ptr : Pointer to source block.
* UINT32 src_pixels_per_line : Stride of input block.
* UINT32 pixel_step : Offset between filter input samples (see notes).
* UINT32 output_height : Input block height.
* UINT32 output_width : Input block width.
* UINT32 dst_pitch : Destination block pitch.
* UINT32 height : Block height.
* UINT32 width : Block width.
* INT32 *vp8_filter : Array of 2 bi-linear filter taps.
*
* OUTPUTS : UINT16 *output_ptr : Pointer to filtered block.
* OUTPUTS : UINT16 *dst_ptr : Pointer to filtered block.
*
* RETURNS : void
*
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block in
* either horizontal or vertical direction to produce the
* filtered output block. Used to implement second-pass
* of 2-D separable filter.
* FUNCTION : Applies a 1-D 2-tap bi-linear filter to the source block
* in the vertical direction to produce the filtered output
* block. Used to implement second-pass of 2-D separable filter.
*
* SPECIAL NOTES : Requires 32-bit input as produced by filter_block2d_bil_first_pass.
* Two filter taps should sum to VP8_FILTER_WEIGHT.
* pixel_step defines whether the filter is applied
* horizontally (pixel_step=1) or vertically (pixel_step=stride).
* It defines the offset required to move from one input
* to the next.
*
****************************************************************************/
void vp8_filter_block2d_bil_second_pass
static void filter_block2d_bil_second_pass
(
unsigned short *src_ptr,
unsigned char *output_ptr,
int output_pitch,
unsigned int src_pixels_per_line,
unsigned int pixel_step,
unsigned int output_height,
unsigned int output_width,
const int *vp8_filter
unsigned char *dst_ptr,
int dst_pitch,
unsigned int height,
unsigned int width,
const short *vp8_filter
)
{
unsigned int i, j;
int Temp;
for (i = 0; i < output_height; i++)
for (i = 0; i < height; i++)
{
for (j = 0; j < output_width; j++)
for (j = 0; j < width; j++)
{
/* Apply filter */
Temp = ((int)src_ptr[0] * vp8_filter[0]) +
((int)src_ptr[pixel_step] * vp8_filter[1]) +
((int)src_ptr[width] * vp8_filter[1]) +
(VP8_FILTER_WEIGHT / 2);
output_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
dst_ptr[j] = (unsigned int)(Temp >> VP8_FILTER_SHIFT);
src_ptr++;
}
/* Next row... */
src_ptr += src_pixels_per_line - output_width;
output_ptr += output_pitch;
dst_ptr += dst_pitch;
}
}
@@ -404,11 +355,14 @@ void vp8_filter_block2d_bil_second_pass
* ROUTINE : filter_block2d_bil
*
* INPUTS : UINT8 *src_ptr : Pointer to source block.
* UINT32 src_pixels_per_line : Stride of input block.
* UINT32 src_pitch : Stride of source block.
* UINT32 dst_pitch : Stride of destination block.
* INT32 *HFilter : Array of 2 horizontal filter taps.
* INT32 *VFilter : Array of 2 vertical filter taps.
* INT32 Width : Block width
* INT32 Height : Block height
*
* OUTPUTS : UINT16 *output_ptr : Pointer to filtered block.
* OUTPUTS : UINT16 *dst_ptr : Pointer to filtered block.
*
* RETURNS : void
*
@@ -419,26 +373,26 @@ void vp8_filter_block2d_bil_second_pass
* SPECIAL NOTES : The largest block size can be handled here is 16x16
*
****************************************************************************/
void vp8_filter_block2d_bil
static void filter_block2d_bil
(
unsigned char *src_ptr,
unsigned char *output_ptr,
unsigned int src_pixels_per_line,
unsigned char *dst_ptr,
unsigned int src_pitch,
unsigned int dst_pitch,
const int *HFilter,
const int *VFilter,
const short *HFilter,
const short *VFilter,
int Width,
int Height
)
{
unsigned short FData[17*16]; /* Temp data bufffer used in filtering */
unsigned short FData[17*16]; /* Temp data buffer used in filtering */
/* First filter 1-D horizontally... */
vp8_filter_block2d_bil_first_pass(src_ptr, FData, src_pixels_per_line, 1, Height + 1, Width, HFilter);
filter_block2d_bil_first_pass(src_ptr, FData, src_pitch, Height + 1, Width, HFilter);
/* then 1-D vertically... */
vp8_filter_block2d_bil_second_pass(FData, output_ptr, dst_pitch, Width, Width, Height, Width, VFilter);
filter_block2d_bil_second_pass(FData, dst_ptr, dst_pitch, Height, Width, VFilter);
}
@@ -452,11 +406,11 @@ void vp8_bilinear_predict4x4_c
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
#if 0
{
int i;
@@ -464,19 +418,19 @@ void vp8_bilinear_predict4x4_c
unsigned char temp2[16];
bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
vp8_filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
for (i = 0; i < 16; i++)
{
if (temp1[i] != temp2[i])
{
bilinear_predict4x4_mmx(src_ptr, src_pixels_per_line, xoffset, yoffset, temp1, 4);
vp8_filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
filter_block2d_bil(src_ptr, temp2, src_pixels_per_line, 4, HFilter, VFilter, 4, 4);
}
}
}
#endif
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 4, 4);
}
@@ -490,13 +444,13 @@ void vp8_bilinear_predict8x8_c
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 8);
}
@@ -510,13 +464,13 @@ void vp8_bilinear_predict8x4_c
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 8, 4);
}
@@ -530,11 +484,11 @@ void vp8_bilinear_predict16x16_c
int dst_pitch
)
{
const int *HFilter;
const int *VFilter;
const short *HFilter;
const short *VFilter;
HFilter = bilinear_filters[xoffset];
VFilter = bilinear_filters[yoffset];
HFilter = vp8_bilinear_filters[xoffset];
VFilter = vp8_bilinear_filters[yoffset];
vp8_filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
filter_block2d_bil(src_ptr, dst_ptr, src_pixels_per_line, dst_pitch, HFilter, VFilter, 16, 16);
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
* Copyright (c) 2011 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
@@ -9,11 +9,14 @@
*/
#ifndef __INC_PARTIALGFUPDATE_H
#define __INC_PARTIALGFUPDATE_H
#ifndef FILTER_H
#define FILTER_H
#include "onyxc_int.h"
#define BLOCK_HEIGHT_WIDTH 4
#define VP8_FILTER_WEIGHT 128
#define VP8_FILTER_SHIFT 7
extern void update_gf_selective(ONYX_COMMON *cm, MACROBLOCKD *x);
extern const short vp8_bilinear_filters[8][2];
extern const short vp8_sub_pel_filters[8][6];
#endif
#endif //FILTER_H

View File

@@ -11,54 +11,23 @@
#include "findnearmv.h"
#define FINDNEAR_SEARCH_SITES 3
const unsigned char vp8_mbsplit_offset[4][16] = {
{ 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{ 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{ 0, 2, 8, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}
};
/* Predict motion vectors using those from already-decoded nearby blocks.
Note that we only consider one 4x4 subblock from each candidate 16x16
macroblock. */
typedef union
{
unsigned int as_int;
MV as_mv;
} int_mv; /* facilitates rapid equality tests */
static void mv_bias(const MODE_INFO *x, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
{
MV xmv;
xmv = x->mbmi.mv.as_mv;
if (ref_frame_sign_bias[x->mbmi.ref_frame] != ref_frame_sign_bias[refframe])
{
xmv.row *= -1;
xmv.col *= -1;
}
mvp->as_mv = xmv;
}
void vp8_clamp_mv(MV *mv, const MACROBLOCKD *xd)
{
if (mv->col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
mv->col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
else if (mv->col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
mv->col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
if (mv->row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
mv->row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
else if (mv->row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
mv->row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
}
void vp8_find_near_mvs
(
MACROBLOCKD *xd,
const MODE_INFO *here,
MV *nearest,
MV *nearby,
MV *best_mv,
int_mv *nearest,
int_mv *nearby,
int_mv *best_mv,
int cnt[4],
int refframe,
int *ref_frame_sign_bias
@@ -82,7 +51,7 @@ void vp8_find_near_mvs
if (above->mbmi.mv.as_int)
{
(++mv)->as_int = above->mbmi.mv.as_int;
mv_bias(above, refframe, mv, ref_frame_sign_bias);
mv_bias(ref_frame_sign_bias[above->mbmi.ref_frame], refframe, mv, ref_frame_sign_bias);
++cntx;
}
@@ -97,7 +66,7 @@ void vp8_find_near_mvs
int_mv this_mv;
this_mv.as_int = left->mbmi.mv.as_int;
mv_bias(left, refframe, &this_mv, ref_frame_sign_bias);
mv_bias(ref_frame_sign_bias[left->mbmi.ref_frame], refframe, &this_mv, ref_frame_sign_bias);
if (this_mv.as_int != mv->as_int)
{
@@ -119,7 +88,7 @@ void vp8_find_near_mvs
int_mv this_mv;
this_mv.as_int = aboveleft->mbmi.mv.as_int;
mv_bias(aboveleft, refframe, &this_mv, ref_frame_sign_bias);
mv_bias(ref_frame_sign_bias[aboveleft->mbmi.ref_frame], refframe, &this_mv, ref_frame_sign_bias);
if (this_mv.as_int != mv->as_int)
{
@@ -162,13 +131,14 @@ void vp8_find_near_mvs
near_mvs[CNT_INTRA] = near_mvs[CNT_NEAREST];
/* Set up return values */
*best_mv = near_mvs[0].as_mv;
*nearest = near_mvs[CNT_NEAREST].as_mv;
*nearby = near_mvs[CNT_NEAR].as_mv;
best_mv->as_int = near_mvs[0].as_int;
nearest->as_int = near_mvs[CNT_NEAREST].as_int;
nearby->as_int = near_mvs[CNT_NEAR].as_int;
vp8_clamp_mv(nearest, xd);
vp8_clamp_mv(nearby, xd);
vp8_clamp_mv(best_mv, xd); /*TODO: move this up before the copy*/
//TODO: move clamp outside findnearmv
vp8_clamp_mv2(nearest, xd);
vp8_clamp_mv2(nearby, xd);
vp8_clamp_mv2(best_mv, xd);
}
vp8_prob *vp8_mv_ref_probs(
@@ -183,26 +153,3 @@ vp8_prob *vp8_mv_ref_probs(
return p;
}
const B_MODE_INFO *vp8_left_bmi(const MODE_INFO *cur_mb, int b)
{
if (!(b & 3))
{
/* On L edge, get from MB to left of us */
--cur_mb;
b += 4;
}
return cur_mb->bmi + b - 1;
}
const B_MODE_INFO *vp8_above_bmi(const MODE_INFO *cur_mb, int b, int mi_stride)
{
if (!(b >> 2))
{
/* On top edge, get from MB above us */
cur_mb -= mi_stride;
b += 16;
}
return cur_mb->bmi + b - 4;
}

View File

@@ -17,11 +17,65 @@
#include "modecont.h"
#include "treecoder.h"
static void mv_bias(int refmb_ref_frame_sign_bias, int refframe, int_mv *mvp, const int *ref_frame_sign_bias)
{
MV xmv;
xmv = mvp->as_mv;
if (refmb_ref_frame_sign_bias != ref_frame_sign_bias[refframe])
{
xmv.row *= -1;
xmv.col *= -1;
}
mvp->as_mv = xmv;
}
#define LEFT_TOP_MARGIN (16 << 3)
#define RIGHT_BOTTOM_MARGIN (16 << 3)
static void vp8_clamp_mv2(int_mv *mv, const MACROBLOCKD *xd)
{
if (mv->as_mv.col < (xd->mb_to_left_edge - LEFT_TOP_MARGIN))
mv->as_mv.col = xd->mb_to_left_edge - LEFT_TOP_MARGIN;
else if (mv->as_mv.col > xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN)
mv->as_mv.col = xd->mb_to_right_edge + RIGHT_BOTTOM_MARGIN;
if (mv->as_mv.row < (xd->mb_to_top_edge - LEFT_TOP_MARGIN))
mv->as_mv.row = xd->mb_to_top_edge - LEFT_TOP_MARGIN;
else if (mv->as_mv.row > xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN)
mv->as_mv.row = xd->mb_to_bottom_edge + RIGHT_BOTTOM_MARGIN;
}
static void vp8_clamp_mv(int_mv *mv, int mb_to_left_edge, int mb_to_right_edge,
int mb_to_top_edge, int mb_to_bottom_edge)
{
mv->as_mv.col = (mv->as_mv.col < mb_to_left_edge) ?
mb_to_left_edge : mv->as_mv.col;
mv->as_mv.col = (mv->as_mv.col > mb_to_right_edge) ?
mb_to_right_edge : mv->as_mv.col;
mv->as_mv.row = (mv->as_mv.row < mb_to_top_edge) ?
mb_to_top_edge : mv->as_mv.row;
mv->as_mv.row = (mv->as_mv.row > mb_to_bottom_edge) ?
mb_to_bottom_edge : mv->as_mv.row;
}
static unsigned int vp8_check_mv_bounds(int_mv *mv, int mb_to_left_edge,
int mb_to_right_edge, int mb_to_top_edge,
int mb_to_bottom_edge)
{
unsigned int need_to_clamp;
need_to_clamp = (mv->as_mv.col < mb_to_left_edge) ? 1 : 0;
need_to_clamp |= (mv->as_mv.col > mb_to_right_edge) ? 1 : 0;
need_to_clamp |= (mv->as_mv.row < mb_to_top_edge) ? 1 : 0;
need_to_clamp |= (mv->as_mv.row > mb_to_bottom_edge) ? 1 : 0;
return need_to_clamp;
}
void vp8_find_near_mvs
(
MACROBLOCKD *xd,
const MODE_INFO *here,
MV *nearest, MV *nearby, MV *best,
int_mv *nearest, int_mv *nearby, int_mv *best,
int near_mv_ref_cts[4],
int refframe,
int *ref_frame_sign_bias
@@ -31,12 +85,89 @@ vp8_prob *vp8_mv_ref_probs(
vp8_prob p[VP8_MVREFS-1], const int near_mv_ref_ct[4]
);
const B_MODE_INFO *vp8_left_bmi(const MODE_INFO *cur_mb, int b);
extern const unsigned char vp8_mbsplit_offset[4][16];
const B_MODE_INFO *vp8_above_bmi(const MODE_INFO *cur_mb, int b, int mi_stride);
#define LEFT_TOP_MARGIN (16 << 3)
#define RIGHT_BOTTOM_MARGIN (16 << 3)
static int left_block_mv(const MODE_INFO *cur_mb, int b)
{
if (!(b & 3))
{
/* On L edge, get from MB to left of us */
--cur_mb;
if(cur_mb->mbmi.mode != SPLITMV)
return cur_mb->mbmi.mv.as_int;
b += 4;
}
return (cur_mb->bmi + b - 1)->mv.as_int;
}
static int above_block_mv(const MODE_INFO *cur_mb, int b, int mi_stride)
{
if (!(b >> 2))
{
/* On top edge, get from MB above us */
cur_mb -= mi_stride;
if(cur_mb->mbmi.mode != SPLITMV)
return cur_mb->mbmi.mv.as_int;
b += 16;
}
return (cur_mb->bmi + b - 4)->mv.as_int;
}
static B_PREDICTION_MODE left_block_mode(const MODE_INFO *cur_mb, int b)
{
if (!(b & 3))
{
/* On L edge, get from MB to left of us */
--cur_mb;
switch (cur_mb->mbmi.mode)
{
case B_PRED:
return (cur_mb->bmi + b + 3)->as_mode;
case DC_PRED:
return B_DC_PRED;
case V_PRED:
return B_VE_PRED;
case H_PRED:
return B_HE_PRED;
case TM_PRED:
return B_TM_PRED;
default:
return B_DC_PRED;
}
}
return (cur_mb->bmi + b - 1)->as_mode;
}
static B_PREDICTION_MODE above_block_mode(const MODE_INFO *cur_mb, int b, int mi_stride)
{
if (!(b >> 2))
{
/* On top edge, get from MB above us */
cur_mb -= mi_stride;
switch (cur_mb->mbmi.mode)
{
case B_PRED:
return (cur_mb->bmi + b + 12)->as_mode;
case DC_PRED:
return B_DC_PRED;
case V_PRED:
return B_VE_PRED;
case H_PRED:
return B_HE_PRED;
case TM_PRED:
return B_TM_PRED;
default:
return B_DC_PRED;
}
}
return (cur_mb->bmi + b - 4)->as_mode;
}
#endif

View File

@@ -1,121 +0,0 @@
/*
* Copyright (c) 2010 The WebM project authors. All Rights Reserved.
*
* Use of this source code is governed by a BSD-style license
* that can be found in the LICENSE file in the root of the source
* tree. An additional intellectual property rights grant can be found
* in the file PATENTS. All contributing project authors may
* be found in the AUTHORS file in the root of the source tree.
*/
#ifndef FOURCC_HPP
#define FOURCC_HPP
#include <iosfwd>
#include <cstring>
#if defined(__POWERPC__) || defined(__APPLE__) || defined(__MERKS__)
using namespace std;
#endif
class four_cc
{
public:
four_cc();
four_cc(const char*);
explicit four_cc(unsigned long);
bool operator==(const four_cc&) const;
bool operator!=(const four_cc&) const;
bool operator==(const char*) const;
bool operator!=(const char*) const;
operator unsigned long() const;
unsigned long as_long() const;
four_cc& operator=(unsigned long);
char operator[](int) const;
std::ostream& put(std::ostream&) const;
bool printable() const;
private:
union
{
char code[4];
unsigned long code_as_long;
};
};
inline four_cc::four_cc()
{
}
inline four_cc::four_cc(unsigned long x)
: code_as_long(x)
{
}
inline four_cc::four_cc(const char* str)
{
memcpy(code, str, 4);
}
inline bool four_cc::operator==(const four_cc& rhs) const
{
return code_as_long == rhs.code_as_long;
}
inline bool four_cc::operator!=(const four_cc& rhs) const
{
return !operator==(rhs);
}
inline bool four_cc::operator==(const char* rhs) const
{
return (memcmp(code, rhs, 4) == 0);
}
inline bool four_cc::operator!=(const char* rhs) const
{
return !operator==(rhs);
}
inline four_cc::operator unsigned long() const
{
return code_as_long;
}
inline unsigned long four_cc::as_long() const
{
return code_as_long;
}
inline char four_cc::operator[](int i) const
{
return code[i];
}
inline four_cc& four_cc::operator=(unsigned long val)
{
code_as_long = val;
return *this;
}
inline std::ostream& operator<<(std::ostream& os, const four_cc& rhs)
{
return rhs.put(os);
}
#endif

View File

@@ -10,21 +10,60 @@
#include "vpx_ports/config.h"
#include "g_common.h"
#include "subpixel.h"
#include "loopfilter.h"
#include "recon.h"
#include "idct.h"
#include "onyxc_int.h"
#include "vp8/common/g_common.h"
#include "vp8/common/subpixel.h"
#include "vp8/common/loopfilter.h"
#include "vp8/common/recon.h"
#include "vp8/common/idct.h"
#include "vp8/common/onyxc_int.h"
#if CONFIG_MULTITHREAD
#if HAVE_UNISTD_H
#include <unistd.h>
#elif defined(_WIN32)
#include <windows.h>
typedef void (WINAPI *PGNSI)(LPSYSTEM_INFO);
#endif
#endif
extern void vp8_arch_x86_common_init(VP8_COMMON *ctx);
extern void vp8_arch_arm_common_init(VP8_COMMON *ctx);
void (*vp8_build_intra_predictors_mby_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby(MACROBLOCKD *x);
#if CONFIG_MULTITHREAD
static int get_cpu_count()
{
int core_count = 16;
void (*vp8_build_intra_predictors_mby_s_ptr)(MACROBLOCKD *x);
extern void vp8_build_intra_predictors_mby_s(MACROBLOCKD *x);
#if HAVE_UNISTD_H
#if defined(_SC_NPROCESSORS_ONLN)
core_count = sysconf(_SC_NPROCESSORS_ONLN);
#elif defined(_SC_NPROC_ONLN)
core_count = sysconf(_SC_NPROC_ONLN);
#endif
#elif defined(_WIN32)
{
PGNSI pGNSI;
SYSTEM_INFO sysinfo;
/* Call GetNativeSystemInfo if supported or
* GetSystemInfo otherwise. */
pGNSI = (PGNSI) GetProcAddress(
GetModuleHandle(TEXT("kernel32.dll")), "GetNativeSystemInfo");
if (pGNSI != NULL)
pGNSI(&sysinfo);
else
GetSystemInfo(&sysinfo);
core_count = sysinfo.dwNumberOfProcessors;
}
#else
/* other platforms */
#endif
return core_count > 0 ? core_count : 1;
}
#endif
void vp8_machine_specific_config(VP8_COMMON *ctx)
{
@@ -45,6 +84,16 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
rtcd->recon.recon4 = vp8_recon4b_c;
rtcd->recon.recon_mb = vp8_recon_mb_c;
rtcd->recon.recon_mby = vp8_recon_mby_c;
rtcd->recon.build_intra_predictors_mby =
vp8_build_intra_predictors_mby;
rtcd->recon.build_intra_predictors_mby_s =
vp8_build_intra_predictors_mby_s;
rtcd->recon.build_intra_predictors_mbuv =
vp8_build_intra_predictors_mbuv;
rtcd->recon.build_intra_predictors_mbuv_s =
vp8_build_intra_predictors_mbuv_s;
rtcd->recon.intra4x4_predict =
vp8_intra4x4_predict;
rtcd->subpix.sixtap16x16 = vp8_sixtap_predict16x16_c;
rtcd->subpix.sixtap8x8 = vp8_sixtap_predict8x8_c;
@@ -59,23 +108,22 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
rtcd->loopfilter.normal_b_v = vp8_loop_filter_bv_c;
rtcd->loopfilter.normal_mb_h = vp8_loop_filter_mbh_c;
rtcd->loopfilter.normal_b_h = vp8_loop_filter_bh_c;
rtcd->loopfilter.simple_mb_v = vp8_loop_filter_mbvs_c;
rtcd->loopfilter.simple_mb_v = vp8_loop_filter_simple_vertical_edge_c;
rtcd->loopfilter.simple_b_v = vp8_loop_filter_bvs_c;
rtcd->loopfilter.simple_mb_h = vp8_loop_filter_mbhs_c;
rtcd->loopfilter.simple_mb_h = vp8_loop_filter_simple_horizontal_edge_c;
rtcd->loopfilter.simple_b_h = vp8_loop_filter_bhs_c;
#if CONFIG_POSTPROC || (CONFIG_VP8_ENCODER && CONFIG_PSNR)
#if CONFIG_POSTPROC || (CONFIG_VP8_ENCODER && CONFIG_INTERNAL_STATS)
rtcd->postproc.down = vp8_mbpost_proc_down_c;
rtcd->postproc.across = vp8_mbpost_proc_across_ip_c;
rtcd->postproc.downacross = vp8_post_proc_down_and_across_c;
rtcd->postproc.addnoise = vp8_plane_add_noise_c;
rtcd->postproc.blend_mb = vp8_blend_mb_c;
rtcd->postproc.blend_mb_inner = vp8_blend_mb_inner_c;
rtcd->postproc.blend_mb_outer = vp8_blend_mb_outer_c;
rtcd->postproc.blend_b = vp8_blend_b_c;
#endif
#endif
/* Pure C: */
vp8_build_intra_predictors_mby_ptr = vp8_build_intra_predictors_mby;
vp8_build_intra_predictors_mby_s_ptr = vp8_build_intra_predictors_mby_s;
#if ARCH_X86 || ARCH_X86_64
vp8_arch_x86_common_init(ctx);
@@ -85,4 +133,7 @@ void vp8_machine_specific_config(VP8_COMMON *ctx)
vp8_arch_arm_common_init(ctx);
#endif
#if CONFIG_MULTITHREAD
ctx->processor_core_count = get_cpu_count();
#endif /* CONFIG_MULTITHREAD */
}

View File

@@ -9,162 +9,149 @@
*/
#include "vpx_ports/config.h"
#include "vpx_config.h"
#include "loopfilter.h"
#include "onyxc_int.h"
#include "vpx_mem/vpx_mem.h"
typedef unsigned char uc;
prototype_loopfilter(vp8_loop_filter_horizontal_edge_c);
prototype_loopfilter(vp8_loop_filter_vertical_edge_c);
prototype_loopfilter(vp8_mbloop_filter_horizontal_edge_c);
prototype_loopfilter(vp8_mbloop_filter_vertical_edge_c);
prototype_loopfilter(vp8_loop_filter_simple_horizontal_edge_c);
prototype_loopfilter(vp8_loop_filter_simple_vertical_edge_c);
prototype_simple_loopfilter(vp8_loop_filter_simple_horizontal_edge_c);
prototype_simple_loopfilter(vp8_loop_filter_simple_vertical_edge_c);
/* Horizontal MB filtering */
void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_mbh_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_horizontal_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
vp8_mbloop_filter_horizontal_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_horizontal_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Vertical MB Filtering */
void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_mbv_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_vertical_edge_c(y_ptr, y_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
vp8_mbloop_filter_vertical_edge_c(u_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->uvmbflim, lfi->uvlim, lfi->uvmbthr, 1);
}
void vp8_loop_filter_mbvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_c(y_ptr, y_stride, lfi->mbflim, lfi->lim, lfi->mbthr, 2);
vp8_mbloop_filter_vertical_edge_c(v_ptr, uv_stride, lfi->mblim, lfi->lim, lfi->hev_thr, 1);
}
/* Horizontal B Filtering */
void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bh_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_horizontal_edge_c(u_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_horizontal_edge_c(v_ptr + 4 * uv_stride, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bhs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bhs_c(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 4 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 8 * y_stride, y_stride, blimit);
vp8_loop_filter_simple_horizontal_edge_c(y_ptr + 12 * y_stride, y_stride, blimit);
}
/* Vertical B Filtering */
void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bv_c(unsigned char *y_ptr, unsigned char *u_ptr,
unsigned char *v_ptr, int y_stride, int uv_stride,
loop_filter_info *lfi)
{
(void) simpler_lpf;
vp8_loop_filter_vertical_edge_c(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 4, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 8, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
vp8_loop_filter_vertical_edge_c(y_ptr + 12, y_stride, lfi->blim, lfi->lim, lfi->hev_thr, 2);
if (u_ptr)
vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_vertical_edge_c(u_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
if (v_ptr)
vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->uvflim, lfi->uvlim, lfi->uvthr, 1);
vp8_loop_filter_vertical_edge_c(v_ptr + 4, uv_stride, lfi->blim, lfi->lim, lfi->hev_thr, 1);
}
void vp8_loop_filter_bvs_c(unsigned char *y_ptr, unsigned char *u_ptr, unsigned char *v_ptr,
int y_stride, int uv_stride, loop_filter_info *lfi, int simpler_lpf)
void vp8_loop_filter_bvs_c(unsigned char *y_ptr, int y_stride,
const unsigned char *blimit)
{
(void) u_ptr;
(void) v_ptr;
(void) uv_stride;
(void) simpler_lpf;
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 4, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 8, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 12, y_stride, lfi->flim, lfi->lim, lfi->thr, 2);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 4, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 8, y_stride, blimit);
vp8_loop_filter_simple_vertical_edge_c(y_ptr + 12, y_stride, blimit);
}
void vp8_init_loop_filter(VP8_COMMON *cm)
static void lf_init_lut(loop_filter_info_n *lfi)
{
loop_filter_info *lfi = cm->lf_info;
LOOPFILTERTYPE lft = cm->filter_type;
int sharpness_lvl = cm->sharpness_level;
int frame_type = cm->frame_type;
int i, j;
int filt_lvl;
int block_inside_limit = 0;
int HEVThresh;
const int yhedge_boost = 2;
const int uvhedge_boost = 2;
for (filt_lvl = 0; filt_lvl <= MAX_LOOP_FILTER; filt_lvl++)
{
if (filt_lvl >= 40)
{
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 2;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 3;
}
else if (filt_lvl >= 20)
{
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 1;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 2;
}
else if (filt_lvl >= 15)
{
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 1;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 1;
}
else
{
lfi->hev_thr_lut[KEY_FRAME][filt_lvl] = 0;
lfi->hev_thr_lut[INTER_FRAME][filt_lvl] = 0;
}
}
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
lfi->mode_lf_lut[DC_PRED] = 1;
lfi->mode_lf_lut[V_PRED] = 1;
lfi->mode_lf_lut[H_PRED] = 1;
lfi->mode_lf_lut[TM_PRED] = 1;
lfi->mode_lf_lut[B_PRED] = 0;
lfi->mode_lf_lut[ZEROMV] = 1;
lfi->mode_lf_lut[NEARESTMV] = 2;
lfi->mode_lf_lut[NEARMV] = 2;
lfi->mode_lf_lut[NEWMV] = 2;
lfi->mode_lf_lut[SPLITMV] = 3;
}
void vp8_loop_filter_update_sharpness(loop_filter_info_n *lfi,
int sharpness_lvl)
{
int i;
/* For each possible value for the loop filter fill out limits */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
int block_inside_limit = 0;
/* Set loop filter paramaeters that control sharpness. */
block_inside_limit = filt_lvl >> (sharpness_lvl > 0);
@@ -179,181 +166,143 @@ void vp8_init_loop_filter(VP8_COMMON *cm)
if (block_inside_limit < 1)
block_inside_limit = 1;
for (j = 0; j < 16; j++)
{
lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl + yhedge_boost;
lfi[i].mbthr[j] = HEVThresh;
lfi[i].flim[j] = filt_lvl;
lfi[i].thr[j] = HEVThresh;
lfi[i].uvlim[j] = block_inside_limit;
lfi[i].uvmbflim[j] = filt_lvl + uvhedge_boost;
lfi[i].uvmbthr[j] = HEVThresh;
lfi[i].uvflim[j] = filt_lvl;
lfi[i].uvthr[j] = HEVThresh;
}
}
/* Set up the function pointers depending on the type of loop filtering selected */
if (lft == NORMAL_LOOPFILTER)
{
cm->lf_mbv = LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v);
cm->lf_bv = LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v);
cm->lf_mbh = LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h);
cm->lf_bh = LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h);
}
else
{
cm->lf_mbv = LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v);
cm->lf_bv = LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v);
cm->lf_mbh = LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h);
cm->lf_bh = LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h);
vpx_memset(lfi->lim[i], block_inside_limit, SIMD_WIDTH);
vpx_memset(lfi->blim[i], (2 * filt_lvl + block_inside_limit),
SIMD_WIDTH);
vpx_memset(lfi->mblim[i], (2 * (filt_lvl + 2) + block_inside_limit),
SIMD_WIDTH);
}
}
/* Put vp8_init_loop_filter() in vp8dx_create_decompressor(). Only call vp8_frame_init_loop_filter() while decoding
* each frame. Check last_frame_type to skip the function most of times.
void vp8_loop_filter_init(VP8_COMMON *cm)
{
loop_filter_info_n *lfi = &cm->lf_info;
int i;
/* init limits for given sharpness*/
vp8_loop_filter_update_sharpness(lfi, cm->sharpness_level);
cm->last_sharpness_level = cm->sharpness_level;
/* init LUT for lvl and hev thr picking */
lf_init_lut(lfi);
/* init hev threshold const vectors */
for(i = 0; i < 4 ; i++)
{
vpx_memset(lfi->hev_thr[i], i, SIMD_WIDTH);
}
}
void vp8_loop_filter_frame_init(VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl)
{
int seg, /* segment number */
ref, /* index in ref_lf_deltas */
mode; /* index in mode_lf_deltas */
loop_filter_info_n *lfi = &cm->lf_info;
/* update limits if sharpness has changed */
if(cm->last_sharpness_level != cm->sharpness_level)
{
vp8_loop_filter_update_sharpness(lfi, cm->sharpness_level);
cm->last_sharpness_level = cm->sharpness_level;
}
for(seg = 0; seg < MAX_MB_SEGMENTS; seg++)
{
int lvl_seg = default_filt_lvl;
int lvl_ref, lvl_mode;
/* Note the baseline filter values for each segment */
if (mbd->segmentation_enabled)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
{
lvl_seg = mbd->segment_feature_data[MB_LVL_ALT_LF][seg];
}
else /* Delta Value */
{
lvl_seg += mbd->segment_feature_data[MB_LVL_ALT_LF][seg];
lvl_seg = (lvl_seg > 0) ? ((lvl_seg > 63) ? 63: lvl_seg) : 0;
}
}
if (!mbd->mode_ref_lf_delta_enabled)
{
/* we could get rid of this if we assume that deltas are set to
* zero when not in use; encoder always uses deltas
*/
void vp8_frame_init_loop_filter(loop_filter_info *lfi, int frame_type)
{
int HEVThresh;
int i, j;
/* For each possible value for the loop filter fill out a "loop_filter_info" entry. */
for (i = 0; i <= MAX_LOOP_FILTER; i++)
{
int filt_lvl = i;
if (frame_type == KEY_FRAME)
{
if (filt_lvl >= 40)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
}
else
{
if (filt_lvl >= 40)
HEVThresh = 3;
else if (filt_lvl >= 20)
HEVThresh = 2;
else if (filt_lvl >= 15)
HEVThresh = 1;
else
HEVThresh = 0;
vpx_memset(lfi->lvl[seg][0], lvl_seg, 4 * 4 );
continue;
}
for (j = 0; j < 16; j++)
{
/*lfi[i].lim[j] = block_inside_limit;
lfi[i].mbflim[j] = filt_lvl+yhedge_boost;*/
lfi[i].mbthr[j] = HEVThresh;
/*lfi[i].flim[j] = filt_lvl;*/
lfi[i].thr[j] = HEVThresh;
/*lfi[i].uvlim[j] = block_inside_limit;
lfi[i].uvmbflim[j] = filt_lvl+uvhedge_boost;*/
lfi[i].uvmbthr[j] = HEVThresh;
/*lfi[i].uvflim[j] = filt_lvl;*/
lfi[i].uvthr[j] = HEVThresh;
}
}
}
lvl_ref = lvl_seg;
/* INTRA_FRAME */
ref = INTRA_FRAME;
void vp8_adjust_mb_lf_value(MACROBLOCKD *mbd, int *filter_level)
{
MB_MODE_INFO *mbmi = &mbd->mode_info_context->mbmi;
if (mbd->mode_ref_lf_delta_enabled)
{
/* Apply delta for reference frame */
*filter_level += mbd->ref_lf_deltas[mbmi->ref_frame];
lvl_ref += mbd->ref_lf_deltas[ref];
/* Apply delta for mode */
if (mbmi->ref_frame == INTRA_FRAME)
{
/* Apply delta for Intra modes */
mode = 0; /* B_PRED */
/* Only the split mode BPRED has a further special case */
if (mbmi->mode == B_PRED)
*filter_level += mbd->mode_lf_deltas[0];
}
else
lvl_mode = lvl_ref + mbd->mode_lf_deltas[mode];
lvl_mode = (lvl_mode > 0) ? (lvl_mode > 63 ? 63 : lvl_mode) : 0; /* clamp */
lfi->lvl[seg][ref][mode] = lvl_mode;
mode = 1; /* all the rest of Intra modes */
lvl_mode = (lvl_ref > 0) ? (lvl_ref > 63 ? 63 : lvl_ref) : 0; /* clamp */
lfi->lvl[seg][ref][mode] = lvl_mode;
/* LAST, GOLDEN, ALT */
for(ref = 1; ref < MAX_REF_FRAMES; ref++)
{
/* Zero motion mode */
if (mbmi->mode == ZEROMV)
*filter_level += mbd->mode_lf_deltas[1];
int lvl_ref = lvl_seg;
/* Split MB motion mode */
else if (mbmi->mode == SPLITMV)
*filter_level += mbd->mode_lf_deltas[3];
/* Apply delta for reference frame */
lvl_ref += mbd->ref_lf_deltas[ref];
/* All other inter motion modes (Nearest, Near, New) */
else
*filter_level += mbd->mode_lf_deltas[2];
/* Apply delta for Inter modes */
for (mode = 1; mode < 4; mode++)
{
lvl_mode = lvl_ref + mbd->mode_lf_deltas[mode];
lvl_mode = (lvl_mode > 0) ? (lvl_mode > 63 ? 63 : lvl_mode) : 0; /* clamp */
lfi->lvl[seg][ref][mode] = lvl_mode;
}
}
/* Range check */
if (*filter_level > MAX_LOOP_FILTER)
*filter_level = MAX_LOOP_FILTER;
else if (*filter_level < 0)
*filter_level = 0;
}
}
void vp8_loop_filter_frame
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl
MACROBLOCKD *mbd
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
loop_filter_info *lfi = cm->lf_info;
loop_filter_info_n *lfi_n = &cm->lf_info;
loop_filter_info lfi;
FRAME_TYPE frame_type = cm->frame_type;
int mb_row;
int mb_col;
int baseline_filter_level[MAX_MB_SEGMENTS];
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
int i;
unsigned char *y_ptr, *u_ptr, *v_ptr;
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
/* Point at base of Mb MODE_INFO list */
const MODE_INFO *mode_info_context = cm->mi;
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter(lfi, frame_type);
vp8_loop_filter_frame_init(cm, mbd, cm->filter_level);
/* Set up the buffer pointers */
y_ptr = post->y_buffer;
@@ -365,101 +314,108 @@ void vp8_loop_filter_frame
{
for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
mode_info_context->mbmi.mode != SPLITMV &&
mode_info_context->mbmi.mb_skip_coeff);
filter_level = baseline_filter_level[Segment];
const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
const int seg = mode_info_context->mbmi.segment_id;
const int ref_frame = mode_info_context->mbmi.ref_frame;
/* Distance of Mb to the various image edges.
* These specified to 8th pel as they are always compared to values that are in 1/8th pel units
* Apply any context driven MB level adjustment
*/
vp8_adjust_mb_lf_value(mbd, &filter_level);
filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
if (filter_level)
{
if (mb_col > 0)
cm->lf_mbv(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
if (cm->filter_type == NORMAL_LOOPFILTER)
{
const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
lfi.mblim = lfi_n->mblim[filter_level];
lfi.blim = lfi_n->blim[filter_level];
lfi.lim = lfi_n->lim[filter_level];
lfi.hev_thr = lfi_n->hev_thr[hev_index];
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bv(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
/* don't apply across umv border */
if (mb_row > 0)
cm->lf_mbh(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bh(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi[filter_level], cm->simpler_lpf);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
(y_ptr, u_ptr, v_ptr, post->y_stride, post->uv_stride, &lfi);
}
else
{
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
/* don't apply across umv border */
if (mb_row > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
}
}
y_ptr += 16;
u_ptr += 8;
v_ptr += 8;
mbd->mode_info_context++; /* step to next MB */
mode_info_context++; /* step to next MB */
}
y_ptr += post->y_stride * 16 - post->y_width;
u_ptr += post->uv_stride * 8 - post->uv_width;
v_ptr += post->uv_stride * 8 - post->uv_width;
mbd->mode_info_context++; /* Skip border mb */
mode_info_context++; /* Skip border mb */
}
}
void vp8_loop_filter_frame_yonly
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl,
int sharpness_lvl
int default_filt_lvl
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
int i;
unsigned char *y_ptr;
int mb_row;
int mb_col;
loop_filter_info *lfi = cm->lf_info;
int baseline_filter_level[MAX_MB_SEGMENTS];
loop_filter_info_n *lfi_n = &cm->lf_info;
loop_filter_info lfi;
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
FRAME_TYPE frame_type = cm->frame_type;
(void) sharpness_lvl;
/* Point at base of Mb MODE_INFO list */
const MODE_INFO *mode_info_context = cm->mi;
/*MODE_INFO * this_mb_mode_info = cm->mi;*/ /* Point at base of Mb MODE_INFO list */
mbd->mode_info_context = cm->mi; /* Point at base of Mb MODE_INFO list */
/* Note the baseline filter values for each segment */
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
#if 0
if(default_filt_lvl == 0) /* no filter applied */
return;
#endif
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter(lfi, frame_type);
vp8_loop_filter_frame_init( cm, mbd, default_filt_lvl);
/* Set up the buffer pointers */
y_ptr = post->y_buffer;
@@ -469,72 +425,106 @@ void vp8_loop_filter_frame_yonly
{
for (mb_col = 0; mb_col < cm->mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
filter_level = baseline_filter_level[Segment];
int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
mode_info_context->mbmi.mode != SPLITMV &&
mode_info_context->mbmi.mb_skip_coeff);
/* Apply any context driven MB level adjustment */
vp8_adjust_mb_lf_value(mbd, &filter_level);
const int mode_index = lfi_n->mode_lf_lut[mode_info_context->mbmi.mode];
const int seg = mode_info_context->mbmi.segment_id;
const int ref_frame = mode_info_context->mbmi.ref_frame;
filter_level = lfi_n->lvl[seg][ref_frame][mode_index];
if (filter_level)
{
if (mb_col > 0)
cm->lf_mbv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (cm->filter_type == NORMAL_LOOPFILTER)
{
const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
lfi.mblim = lfi_n->mblim[filter_level];
lfi.blim = lfi_n->blim[filter_level];
lfi.lim = lfi_n->lim[filter_level];
lfi.hev_thr = lfi_n->hev_thr[hev_index];
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
/* don't apply across umv border */
if (mb_row > 0)
cm->lf_mbh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
}
else
{
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
/* don't apply across umv border */
if (mb_row > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
}
}
y_ptr += 16;
mbd->mode_info_context ++; /* step to next MB */
mode_info_context ++; /* step to next MB */
}
y_ptr += post->y_stride * 16 - post->y_width;
mbd->mode_info_context ++; /* Skip border mb */
mode_info_context ++; /* Skip border mb */
}
}
void vp8_loop_filter_partial_frame
(
VP8_COMMON *cm,
MACROBLOCKD *mbd,
int default_filt_lvl,
int sharpness_lvl,
int Fraction
int default_filt_lvl
)
{
YV12_BUFFER_CONFIG *post = cm->frame_to_show;
int i;
unsigned char *y_ptr;
int mb_row;
int mb_col;
/*int mb_rows = post->y_height >> 4;*/
int mb_cols = post->y_width >> 4;
int linestocopy;
int linestocopy, i;
loop_filter_info_n *lfi_n = &cm->lf_info;
loop_filter_info lfi;
loop_filter_info *lfi = cm->lf_info;
int baseline_filter_level[MAX_MB_SEGMENTS];
int filter_level;
int alt_flt_enabled = mbd->segmentation_enabled;
FRAME_TYPE frame_type = cm->frame_type;
(void) sharpness_lvl;
const MODE_INFO *mode_info_context;
/*MODE_INFO * this_mb_mode_info = cm->mi + (post->y_height>>5) * (mb_cols + 1);*/ /* Point at base of Mb MODE_INFO list */
mbd->mode_info_context = cm->mi + (post->y_height >> 5) * (mb_cols + 1); /* Point at base of Mb MODE_INFO list */
int lvl_seg[MAX_MB_SEGMENTS];
linestocopy = (post->y_height >> (4 + Fraction));
mode_info_context = cm->mi + (post->y_height >> 5) * (mb_cols + 1);
/* 3 is a magic number. 4 is probably magic too */
linestocopy = (post->y_height >> (4 + 3));
if (linestocopy < 1)
linestocopy = 1;
@@ -542,32 +532,27 @@ void vp8_loop_filter_partial_frame
linestocopy <<= 4;
/* Note the baseline filter values for each segment */
/* See vp8_loop_filter_frame_init. Rather than call that for each change
* to default_filt_lvl, copy the relevant calculation here.
*/
if (alt_flt_enabled)
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
{
/* Abs value */
{ /* Abs value */
if (mbd->mb_segement_abs_delta == SEGMENT_ABSDATA)
baseline_filter_level[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
{
lvl_seg[i] = mbd->segment_feature_data[MB_LVL_ALT_LF][i];
}
/* Delta Value */
else
{
baseline_filter_level[i] = default_filt_lvl + mbd->segment_feature_data[MB_LVL_ALT_LF][i];
baseline_filter_level[i] = (baseline_filter_level[i] >= 0) ? ((baseline_filter_level[i] <= MAX_LOOP_FILTER) ? baseline_filter_level[i] : MAX_LOOP_FILTER) : 0; /* Clamp to valid range */
lvl_seg[i] = default_filt_lvl
+ mbd->segment_feature_data[MB_LVL_ALT_LF][i];
lvl_seg[i] = (lvl_seg[i] > 0) ?
((lvl_seg[i] > 63) ? 63: lvl_seg[i]) : 0;
}
}
}
else
{
for (i = 0; i < MAX_MB_SEGMENTS; i++)
baseline_filter_level[i] = default_filt_lvl;
}
/* Initialize the loop filter for this frame. */
if ((cm->last_filter_type != cm->filter_type) || (cm->last_sharpness_level != cm->sharpness_level))
vp8_init_loop_filter(cm);
else if (frame_type != cm->last_frame_type)
vp8_frame_init_loop_filter(lfi, frame_type);
/* Set up the buffer pointers */
y_ptr = post->y_buffer + (post->y_height >> 5) * 16 * post->y_stride;
@@ -577,28 +562,64 @@ void vp8_loop_filter_partial_frame
{
for (mb_col = 0; mb_col < mb_cols; mb_col++)
{
int Segment = (alt_flt_enabled) ? mbd->mode_info_context->mbmi.segment_id : 0;
filter_level = baseline_filter_level[Segment];
int skip_lf = (mode_info_context->mbmi.mode != B_PRED &&
mode_info_context->mbmi.mode != SPLITMV &&
mode_info_context->mbmi.mb_skip_coeff);
if (alt_flt_enabled)
filter_level = lvl_seg[mode_info_context->mbmi.segment_id];
else
filter_level = default_filt_lvl;
if (filter_level)
{
if (cm->filter_type == NORMAL_LOOPFILTER)
{
const int hev_index = lfi_n->hev_thr_lut[frame_type][filter_level];
lfi.mblim = lfi_n->mblim[filter_level];
lfi.blim = lfi_n->blim[filter_level];
lfi.lim = lfi_n->lim[filter_level];
lfi.hev_thr = lfi_n->hev_thr[hev_index];
if (mb_col > 0)
cm->lf_mbv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bv(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_v)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
cm->lf_mbh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
LF_INVOKE(&cm->rtcd.loopfilter, normal_mb_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
if (mbd->mode_info_context->mbmi.dc_diff > 0)
cm->lf_bh(y_ptr, 0, 0, post->y_stride, 0, &lfi[filter_level], 0);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, normal_b_h)
(y_ptr, 0, 0, post->y_stride, 0, &lfi);
}
else
{
if (mb_col > 0)
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_v)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_v)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
LF_INVOKE(&cm->rtcd.loopfilter, simple_mb_h)
(y_ptr, post->y_stride, lfi_n->mblim[filter_level]);
if (!skip_lf)
LF_INVOKE(&cm->rtcd.loopfilter, simple_b_h)
(y_ptr, post->y_stride, lfi_n->blim[filter_level]);
}
}
y_ptr += 16;
mbd->mode_info_context += 1; /* step to next MB */
mode_info_context += 1; /* step to next MB */
}
y_ptr += post->y_stride * 16 - post->y_width;
mbd->mode_info_context += 1; /* Skip border mb */
mode_info_context += 1; /* Skip border mb */
}
}

View File

@@ -13,6 +13,7 @@
#define loopfilter_h
#include "vpx_ports/mem.h"
#include "vpx_config.h"
#define MAX_LOOP_FILTER 63
@@ -22,32 +23,45 @@ typedef enum
SIMPLE_LOOPFILTER = 1
} LOOPFILTERTYPE;
/* FRK
* Need to align this structure so when it is declared and
#if ARCH_ARM
#define SIMD_WIDTH 1
#else
#define SIMD_WIDTH 16
#endif
/* Need to align this structure so when it is declared and
* passed it can be loaded into vector registers.
*/
typedef struct
{
DECLARE_ALIGNED(16, signed char, lim[16]);
DECLARE_ALIGNED(16, signed char, flim[16]);
DECLARE_ALIGNED(16, signed char, thr[16]);
DECLARE_ALIGNED(16, signed char, mbflim[16]);
DECLARE_ALIGNED(16, signed char, mbthr[16]);
DECLARE_ALIGNED(16, signed char, uvlim[16]);
DECLARE_ALIGNED(16, signed char, uvflim[16]);
DECLARE_ALIGNED(16, signed char, uvthr[16]);
DECLARE_ALIGNED(16, signed char, uvmbflim[16]);
DECLARE_ALIGNED(16, signed char, uvmbthr[16]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, mblim[MAX_LOOP_FILTER + 1][SIMD_WIDTH]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, blim[MAX_LOOP_FILTER + 1][SIMD_WIDTH]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, lim[MAX_LOOP_FILTER + 1][SIMD_WIDTH]);
DECLARE_ALIGNED(SIMD_WIDTH, unsigned char, hev_thr[4][SIMD_WIDTH]);
unsigned char lvl[4][4][4];
unsigned char hev_thr_lut[2][MAX_LOOP_FILTER + 1];
unsigned char mode_lf_lut[10];
} loop_filter_info_n;
typedef struct
{
const unsigned char * mblim;
const unsigned char * blim;
const unsigned char * lim;
const unsigned char * hev_thr;
} loop_filter_info;
#define prototype_loopfilter(sym) \
void sym(unsigned char *src, int pitch, const signed char *flimit,\
const signed char *limit, const signed char *thresh, int count)
void sym(unsigned char *src, int pitch, const unsigned char *blimit,\
const unsigned char *limit, const unsigned char *thresh, int count)
#define prototype_loopfilter_block(sym) \
void sym(unsigned char *y, unsigned char *u, unsigned char *v,\
int ystride, int uv_stride, loop_filter_info *lfi, int simpler)
void sym(unsigned char *y, unsigned char *u, unsigned char *v, \
int ystride, int uv_stride, loop_filter_info *lfi)
#define prototype_simple_loopfilter(sym) \
void sym(unsigned char *y, int ystride, const unsigned char *blimit)
#if ARCH_X86 || ARCH_X86_64
#include "x86/loopfilter_x86.h"
@@ -77,38 +91,39 @@ extern prototype_loopfilter_block(vp8_lf_normal_mb_h);
#endif
extern prototype_loopfilter_block(vp8_lf_normal_b_h);
#ifndef vp8_lf_simple_mb_v
#define vp8_lf_simple_mb_v vp8_loop_filter_mbvs_c
#define vp8_lf_simple_mb_v vp8_loop_filter_simple_vertical_edge_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_mb_v);
extern prototype_simple_loopfilter(vp8_lf_simple_mb_v);
#ifndef vp8_lf_simple_b_v
#define vp8_lf_simple_b_v vp8_loop_filter_bvs_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_b_v);
extern prototype_simple_loopfilter(vp8_lf_simple_b_v);
#ifndef vp8_lf_simple_mb_h
#define vp8_lf_simple_mb_h vp8_loop_filter_mbhs_c
#define vp8_lf_simple_mb_h vp8_loop_filter_simple_horizontal_edge_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_mb_h);
extern prototype_simple_loopfilter(vp8_lf_simple_mb_h);
#ifndef vp8_lf_simple_b_h
#define vp8_lf_simple_b_h vp8_loop_filter_bhs_c
#endif
extern prototype_loopfilter_block(vp8_lf_simple_b_h);
extern prototype_simple_loopfilter(vp8_lf_simple_b_h);
typedef prototype_loopfilter_block((*vp8_lf_block_fn_t));
typedef prototype_simple_loopfilter((*vp8_slf_block_fn_t));
typedef struct
{
vp8_lf_block_fn_t normal_mb_v;
vp8_lf_block_fn_t normal_b_v;
vp8_lf_block_fn_t normal_mb_h;
vp8_lf_block_fn_t normal_b_h;
vp8_lf_block_fn_t simple_mb_v;
vp8_lf_block_fn_t simple_b_v;
vp8_lf_block_fn_t simple_mb_h;
vp8_lf_block_fn_t simple_b_h;
vp8_slf_block_fn_t simple_mb_v;
vp8_slf_block_fn_t simple_b_v;
vp8_slf_block_fn_t simple_mb_h;
vp8_slf_block_fn_t simple_b_h;
} vp8_loopfilter_rtcd_vtable_t;
#if CONFIG_RUNTIME_CPU_DETECT
@@ -121,10 +136,33 @@ typedef void loop_filter_uvfunction
(
unsigned char *u, /* source pointer */
int p, /* pitch */
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
unsigned char *v
);
/* assorted loopfilter functions which get used elsewhere */
struct VP8Common;
struct MacroBlockD;
void vp8_loop_filter_init(struct VP8Common *cm);
void vp8_loop_filter_frame_init(struct VP8Common *cm,
struct MacroBlockD *mbd,
int default_filt_lvl);
void vp8_loop_filter_frame(struct VP8Common *cm, struct MacroBlockD *mbd);
void vp8_loop_filter_partial_frame(struct VP8Common *cm,
struct MacroBlockD *mbd,
int default_filt_lvl);
void vp8_loop_filter_frame_yonly(struct VP8Common *cm,
struct MacroBlockD *mbd,
int default_filt_lvl);
void vp8_loop_filter_update_sharpness(loop_filter_info_n *lfi,
int sharpness_lvl);
#endif

View File

@@ -24,8 +24,9 @@ static __inline signed char vp8_signed_char_clamp(int t)
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
static __inline signed char vp8_filter_mask(signed char limit, signed char flimit,
uc p3, uc p2, uc p1, uc p0, uc q0, uc q1, uc q2, uc q3)
static __inline signed char vp8_filter_mask(uc limit, uc blimit,
uc p3, uc p2, uc p1, uc p0,
uc q0, uc q1, uc q2, uc q3)
{
signed char mask = 0;
mask |= (abs(p3 - p2) > limit) * -1;
@@ -34,13 +35,13 @@ static __inline signed char vp8_filter_mask(signed char limit, signed char flimi
mask |= (abs(q1 - q0) > limit) * -1;
mask |= (abs(q2 - q1) > limit) * -1;
mask |= (abs(q3 - q2) > limit) * -1;
mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > flimit * 2 + limit) * -1;
mask |= (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 > blimit) * -1;
mask = ~mask;
return mask;
}
/* is there high variance internal edge ( 11111111 yes, 00000000 no) */
static __inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0, uc q1)
static __inline signed char vp8_hevmask(uc thresh, uc p1, uc p0, uc q0, uc q1)
{
signed char hev = 0;
hev |= (abs(p1 - p0) > thresh) * -1;
@@ -48,7 +49,8 @@ static __inline signed char vp8_hevmask(signed char thresh, uc p1, uc p0, uc q0,
return hev;
}
static __inline void vp8_filter(signed char mask, signed char hev, uc *op1, uc *op0, uc *oq0, uc *oq1)
static __inline void vp8_filter(signed char mask, uc hev, uc *op1,
uc *op0, uc *oq0, uc *oq1)
{
signed char ps0, qs0;
@@ -98,9 +100,9 @@ void vp8_loop_filter_horizontal_edge_c
(
unsigned char *s,
int p, /* pitch */
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -113,11 +115,11 @@ void vp8_loop_filter_horizontal_edge_c
*/
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4*p], s[-3*p], s[-2*p], s[-1*p],
s[0*p], s[1*p], s[2*p], s[3*p]);
hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
hev = vp8_hevmask(thresh[0], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_filter(mask, hev, s - 2 * p, s - 1 * p, s, s + 1 * p);
@@ -130,9 +132,9 @@ void vp8_loop_filter_vertical_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -145,10 +147,10 @@ void vp8_loop_filter_vertical_edge_c
*/
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
hev = vp8_hevmask(thresh[0], s[-2], s[-1], s[0], s[1]);
vp8_filter(mask, hev, s - 2, s - 1, s, s + 1);
@@ -157,7 +159,7 @@ void vp8_loop_filter_vertical_edge_c
while (++i < count * 8);
}
static __inline void vp8_mbfilter(signed char mask, signed char hev,
static __inline void vp8_mbfilter(signed char mask, uc hev,
uc *op2, uc *op1, uc *op0, uc *oq0, uc *oq1, uc *oq2)
{
signed char s, u;
@@ -216,9 +218,9 @@ void vp8_mbloop_filter_horizontal_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -232,11 +234,11 @@ void vp8_mbloop_filter_horizontal_edge_c
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4*p], s[-3*p], s[-2*p], s[-1*p],
s[0*p], s[1*p], s[2*p], s[3*p]);
hev = vp8_hevmask(thresh[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
hev = vp8_hevmask(thresh[0], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_mbfilter(mask, hev, s - 3 * p, s - 2 * p, s - 1 * p, s, s + 1 * p, s + 2 * p);
@@ -251,9 +253,9 @@ void vp8_mbloop_filter_vertical_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
const unsigned char *blimit,
const unsigned char *limit,
const unsigned char *thresh,
int count
)
{
@@ -264,10 +266,10 @@ void vp8_mbloop_filter_vertical_edge_c
do
{
mask = vp8_filter_mask(limit[i], flimit[i],
mask = vp8_filter_mask(limit[0], blimit[0],
s[-4], s[-3], s[-2], s[-1], s[0], s[1], s[2], s[3]);
hev = vp8_hevmask(thresh[i], s[-2], s[-1], s[0], s[1]);
hev = vp8_hevmask(thresh[0], s[-2], s[-1], s[0], s[1]);
vp8_mbfilter(mask, hev, s - 3, s - 2, s - 1, s, s + 1, s + 2);
@@ -278,13 +280,13 @@ void vp8_mbloop_filter_vertical_edge_c
}
/* should we apply any filter at all ( 11111111 yes, 00000000 no) */
static __inline signed char vp8_simple_filter_mask(signed char limit, signed char flimit, uc p1, uc p0, uc q0, uc q1)
static __inline signed char vp8_simple_filter_mask(uc blimit, uc p1, uc p0, uc q0, uc q1)
{
/* Why does this cause problems for win32?
* error C2143: syntax error : missing ';' before 'type'
* (void) limit;
*/
signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 <= flimit * 2 + limit) * -1;
signed char mask = (abs(p0 - q0) * 2 + abs(p1 - q1) / 2 <= blimit) * -1;
return mask;
}
@@ -317,47 +319,37 @@ void vp8_loop_filter_simple_horizontal_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count
const unsigned char *blimit
)
{
signed char mask = 0;
int i = 0;
(void) thresh;
do
{
/*mask = vp8_simple_filter_mask( limit[i], flimit[i],s[-1*p],s[0*p]);*/
mask = vp8_simple_filter_mask(limit[i], flimit[i], s[-2*p], s[-1*p], s[0*p], s[1*p]);
mask = vp8_simple_filter_mask(blimit[0], s[-2*p], s[-1*p], s[0*p], s[1*p]);
vp8_simple_filter(mask, s - 2 * p, s - 1 * p, s, s + 1 * p);
++s;
}
while (++i < count * 8);
while (++i < 16);
}
void vp8_loop_filter_simple_vertical_edge_c
(
unsigned char *s,
int p,
const signed char *flimit,
const signed char *limit,
const signed char *thresh,
int count
const unsigned char *blimit
)
{
signed char mask = 0;
int i = 0;
(void) thresh;
do
{
/*mask = vp8_simple_filter_mask( limit[i], flimit[i],s[-1],s[0]);*/
mask = vp8_simple_filter_mask(limit[i], flimit[i], s[-2], s[-1], s[0], s[1]);
mask = vp8_simple_filter_mask(blimit[0], s[-2], s[-1], s[0], s[1]);
vp8_simple_filter(mask, s - 2, s - 1, s, s + 1);
s += p;
}
while (++i < count * 8);
while (++i < 16);
}

View File

@@ -17,7 +17,7 @@ typedef enum
DEST = 1
} BLOCKSET;
void vp8_setup_block
static void setup_block
(
BLOCKD *b,
int mv_stride,
@@ -43,7 +43,8 @@ void vp8_setup_block
}
void vp8_setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
static void setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
{
int block;
@@ -64,16 +65,16 @@ void vp8_setup_macroblock(MACROBLOCKD *x, BLOCKSET bs)
for (block = 0; block < 16; block++) /* y blocks */
{
vp8_setup_block(&x->block[block], x->dst.y_stride, y, x->dst.y_stride,
setup_block(&x->block[block], x->dst.y_stride, y, x->dst.y_stride,
(block >> 2) * 4 * x->dst.y_stride + (block & 3) * 4, bs);
}
for (block = 16; block < 20; block++) /* U and V blocks */
{
vp8_setup_block(&x->block[block], x->dst.uv_stride, u, x->dst.uv_stride,
setup_block(&x->block[block], x->dst.uv_stride, u, x->dst.uv_stride,
((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
vp8_setup_block(&x->block[block+4], x->dst.uv_stride, v, x->dst.uv_stride,
setup_block(&x->block[block+4], x->dst.uv_stride, v, x->dst.uv_stride,
((block - 16) >> 1) * 4 * x->dst.uv_stride + (block & 1) * 4, bs);
}
}
@@ -124,6 +125,6 @@ void vp8_build_block_doffsets(MACROBLOCKD *x)
{
/* handle the destination pitch features */
vp8_setup_macroblock(x, DEST);
vp8_setup_macroblock(x, PRED);
setup_macroblock(x, DEST);
setup_macroblock(x, PRED);
}

View File

@@ -11,6 +11,7 @@
#ifndef __INC_MV_H
#define __INC_MV_H
#include "vpx/vpx_integer.h"
typedef struct
{
@@ -18,4 +19,10 @@ typedef struct
short col;
} MV;
typedef union
{
uint32_t as_int;
MV as_mv;
} int_mv; /* facilitates faster equality tests and copies */
#endif

Some files were not shown because too many files have changed in this diff Show More