This patch did a cleanup following the commit "Save NEON registers
in VP8 NEON functions". The pushing/poping of callee-saved NEON
registers was moved into individual NEON functions. Therefore,
we don't need to save those registers at the beginning of codec.
The related code was removed.
Change-Id: I5648166514fc9beffb780aa138495597731f49ea
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.
Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
This patch fixed errors reported in Issue 746: "dr memory VP8
encode errors" and Issue 745: "dr memory VP8 decode errors".
The "UNINITIALIZED READ" errors were fixed in x86 assembly
code. The list of files fixed is
vp8_intra_pred_uv_tm_sse2
vp8_intra_pred_uv_tm_ssse3
vp8_intra_pred_uv_ho_mmx2
vp8_intra_pred_uv_ho_ssse3
vp8_intra_pred_y_tm_sse2
vp8_intra_pred_y_tm_ssse3
vp8_intra_pred_y_ho_sse2
Change-Id: Ib6df7bf1d442077fe534edfd90e50ad16fadacdd
This patch fixed WebRTC Issue 3020: "Uninit error at
vp8_mbpost_proc_down_xmm". The first 8 values in d were not initialized,
but was accessed. This patch fixed c code as well as mmx and sse2 code.
Change-Id: Iaa5b41a4ed3bea971b15fb826ce34b7ab4e36fb1
significantly speeds up file generation.
the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.
---
Linux
[CREATE] vpx_scale_rtcd.h
real 0m0.485s -> 0m0.022s
[CREATE] vp8_rtcd.h
real 0m4.619s -> 0m0.060s
[CREATE] vp9_rtcd.h
real 0m10.102s -> 0m0.087s
Windows
[CREATE] vpx_scale_rtcd.h
real 0m8.360s -> 0m0.080s
[CREATE] vp8_rtcd.h
real 1m8.083s -> 0m0.160s
[CREATE] vp9_rtcd.h
real 2m6.489s -> 0m0.233s
Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
Filter out files ending in _neon.c and append .neon so the Android build
system knows to apply -mfpu=neon
Change-Id: Ib67277e5920bfcaeda7c4aa16cd1001b11d59305
The declaration of the bilinear filters specified an alignment clause
in the implementation file but not in the header. This turned out
to be harmless, but it did cause linker warnings to be emitted when
building on Windows.
The (extern) declaration in the header was changed, to match the
declaration in the implementation.
Change-Id: I44be89b1572fe9a50fa47a42e4db9128c4897b04
Jenkins warns on left shift of negative numbers and non-aligned read
of int. This commit fixed the two issues.
Change-Id: I389a7fb6a572c643902e40a4c10fefef94500d2c
Adds a new end-usage option for constant quality encoding in vpx. This
first version implemented for VP9, encodes all regular inter frames
using the quality specified in the --cq-level= option, while encoding
all key frames and golden/altref frames at a quality better than that.
The current performance on derfraw300 is +0.910% up from bitrate control,
but achieved without multiple recode loops per frame.
The decision for qp for each altref/golden/key frame will be improved
in subsequent patches based on better use of stats from the first pass.
Further, the qp for regular inter frames may also be varied around the
provided cq-level.
Change-Id: I6c4a2a68563679d60e0616ebcb11698578615fb3
these are only used in the encoder.
frames_since_golden / frames_till_alt_ref_frame -> VP[89]_COMP
Change-Id: Ie14a6f46987bced685ddb449b85dc261caba6dfe
this was never fleshed out in the context of VP8, for which it was
added. for VP9 it has no meaning.
Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff
Previously, the microsoft arm assembler errored out, saying that
bilinear_taps_coeff was an undefined symbol.
Change-Id: Ib938f0b454c41ccbd801e70a7c9acc0fa04e3c55
and denoising.c
Adding -Wshadow to CFLAGS generated a bunch of warnings. This patch
removes these warnings.
Change-Id: I434a9f4bfac9ad4ab7d2a67a35ef21e6636280da
Adding -Wshadow to CFLAGS generated a bunch of warnings. This commit
is based on work already done by jzern.
Change-Id: Iefc08a7ab601c4d1b507f039577433bfb1c6cc9d
Remove dependency of this function on asm_offsets. ssse3/sse4 next.
Change quant_shift calculation so it be done using SIMD. Pre-calculate
as much as possible to simplify EOB selection.
Take advantage of qcoeff being zero'd by tying the if statements
together.
Speed parity with previous implementation with gcc x86_64 linux
Change-Id: Ife97556a1eca3a74b09def1a3d04084974dff1fb
Started adding support for multiple internal decoder instances. Also added
code to limit the vp8 config options available when using frame-based
multithreading.
Change-Id: I0f1ee7abcfcff59204f50162e28254b8dd6972eb
When error concealment is enabled, it swaps the mi and prev_mi ptrs after
each frame is decoded. The postproc uses the mi ptr for the mode info context.
Now the postproc will use the correct mode info context.
Change-Id: I537ae5450f319c624999b44525bb52bb30047b7b
Use the proper seg/mode/ref filter offsets when selecting the
frame loop filter level for fast mode (pick_filter_level_fast).
Change-Id: I2473e2131c800ad19755cb6211ad735fecfe2ac0
xmm[6-11] should be saved and restored for Windows x64; prevents an
encoder mismatch and some datarate issues.
Change-Id: I03c38eb18ec20c6c441cae19416393058baad1ee
xmm6/xmm7 should be saved and restored for Windows x64; prevents an
encoder mismatch and some datarate issues.
Change-Id: Ifa1a82ab25fbdc5112d66f5332e14b16e69ac164
The vp8_post_proc_down_and_across_mb_row_sse2() needs space for an
even number of macroblocks, as they are read two at a time for the
chroma planes. Round up the width during the allocation of
pp_limits_buffer to support this.
Change-Id: Ibfc10c7be290d961ab23ac3dde12a7bb96c12af0
If the threshold(limits) <= 0, skipped filtering and copied the
frame directly. Also, fixed memory allocation checking.
Change-Id: If3d79d5b2bcb71b9777e6eb5cba1384585131e22
1. Algorithm modification:
Instead of having same filter threshold for a whole frame, now we
allow the thresholds to be adjusted for each macroblock. In current
implementation, to avoid excessive blur on background as reported
in issue480(http://code.google.com/p/webm/issues/detail?id=480), we
reduce the thresholds for skipped macroblocks.
2. SSE2 optimization:
As started in issue479(http://code.google.com/p/webm/issues/detail?id=479),
the filter calculation was adjusted for better performance. The c
code was also modified accordingly. This made the deblock filter
2x faster, and the decoder was 1.2x faster overall.
Next, the demacroblock filter will be modified similarly.
Change-Id: I05e54c3f580ccd427487d085096b3174f2ab7e86
Protect the call to {Initialize,Delete}CriticalSection() with an
Interlocked{Inc,Dec}rement() pair, rather than the previous static
initialization. This should play better with AppVerifier, and fix issue
http://code.google.com/p/webm/issues/detail?id=467
Change-Id: I06eadbfac1b3b4414adb8eac862ef9bd14bbe4ad
Fixes some build issues for people building for win32 who have a
pthreads emulation layer installed.
Change-Id: I0e0003fa01f65020f6ced35d961dcb1130db37a8
Initialize the top line at the beginning of each frame and
the left column at the beginning of each row.
Change-Id: I5412f7ea49ffc490215cf65a62715a6c5e3a5a29
Adjusts some of the qualification thresholds in mfqe to eliminate
artifacts due to wrong decisions. Besides, a new qualification
criteria is used to disable mfqe if the quality of the previous
frame is itself not too good.
Change-Id: I4097c20b7fd4fcc60cc3003c1e33e8faae2ff066
Interleaved loopfiltering with decode. For 1080p clips, up to 1%
performance gain. For 4k clips, up to 10% seen. This patch is required
for better "frame-based" multithreading.
Change-Id: Ic834cf32297cc04f27e8205652fb9f70cbe290db
predict_d has become canonical. Remove previous helper function.
Disable ARM assembly pending update.
Change-Id: Idd84ac8a28f9b0221ea97904a77de1e705d06a7d
SAD returns unsigned values. Make all the declarations the same.
Remove bestsad initialization and check. It is always set to the
result of a SAD call so it will never remain UINT_MAX
Use ja instead of jg to test unsigned comparison instead of signed.
Update test.
Change-Id: I46336ab45f4e60fc37caf20bd36bc5782079c7a5