Commit Graph

1335 Commits

Author SHA1 Message Date
Attila Nagy
51c4f9e6b1 Avoid multiple test for same lvl in auto filter lvl pick
Sometimes same level is tested 2-3 times; store and reuse the
calculated error value.

Change-Id: Ia1c04a2568232edf9a5a62c4e2d8e8a50d85e00e
2011-12-14 15:56:29 +02:00
Attila Nagy
55fbdd58ac Force realtime version 1 streams to only use simple loopfilter
...regardless of the speed settings.

Change-Id: I4b91ac7a7208efd690dfc69e175f8eb769b6ce03
2011-12-14 12:57:49 +02:00
James Berry
f8b431c334 fix: active_worst_quality could be set above 127
add check to set active_worst_quality to 127 if it
is set above 127

Change-Id: I7db353d5c1b1c8516a116542b6ed21c0110bb512
2011-12-13 14:58:59 -05:00
John Koleszar
d6020f9d52 tokenizer: use correct block type context in stuff1st_order_b
The fast-path for skipped MBs was not correctly respecting the
block type during update of the coefficient counts. Extracted
this from part of change I365cfb6ac636f19c545f682e3aeac185253abaef

Change-Id: I53d8cf0a00a98034b97b0ed3414b703bae74a739
2011-12-13 11:58:56 -08:00
Jim Bankoski
6b2792b0e0 Merge "vp8e - entropy stats per frame type" 2011-12-12 09:08:34 -08:00
Scott LaVarnway
c4aa1d508e Removed unused diff buffer
Change-Id: I9211358cca89b1c4f84b53a202a63ecf9e79ae4c
2011-12-12 11:06:55 -05:00
Scott LaVarnway
afa1b66108 Merge "Improved mmx/sse2 versions of iwalsh" 2011-12-12 06:40:28 -08:00
Jim Bankoski
6de67cd6e8 vp8e - entropy stats per frame type
Change-Id: I4168eb6ea22ae541471738a7a3453e7d52059275
2011-12-09 16:56:18 -08:00
Scott LaVarnway
9fa6132fc5 Improved mmx/sse2 versions of iwalsh
Removed unnecessary transposes.

Change-Id: I029fbaf8afafee34d54a4f3333c22023c15003c3
2011-12-08 14:37:59 -05:00
Johann
a69810b893 Merge "Reduce mem copies in encoder loopfilter level picking" 2011-12-07 10:41:00 -08:00
Attila Nagy
e570b0406d Reduce mem copies in encoder loopfilter level picking
Do the test filtering in the existing backup frame buffer instead of
the original. Copy the original data into extra buffer before doing
the  filtering. This way there is no need to restore the original
unfiltered  frame at the end of level picking process.

This came up in some discussions with Johann. Thanks!

Change-Id: I495f4301d983854673276c34ec0ddf9a9d622122
2011-12-07 09:59:50 +02:00
Yunqing Wang
aa7335e610 Multiple-resolution encoder
The example encoder down-samples the input video frames a number of
times with a down-sampling factor, and then encodes and outputs
bitstreams with different resolutions.

Support arbitrary down-sampling factor, and down-sampling factor
can be different for each encoding level.

For example, the encoder can be tested as follows.
1. Configure with multi-resolution encoding enabled:
../libvpx/configure --target=x86-linux-gcc --disable-codecs
--enable-vp8 --enable-runtime_cpu_detect --enable-debug
--disable-install-docs --enable-error-concealment
--enable-multi-res-encoding
2. Run make
3. Encode:
If input video is 1280x720, run:
./vp8_multi_resolution_encoder 1280 720 input.yuv 1.ivf 2.ivf 3.ivf 1
(output: 1.ivf(1280x720); 2.ivf(640x360); 3.ivf(320x180).
The last parameter is set to 1/0 to show/not show PSNR.)
4. Decode:
./simple_decoder 1.ivf 1.yuv
./simple_decoder 2.ivf 2.yuv
./simple_decoder 3.ivf 3.yuv
5. View video:
mplayer 1.yuv -demuxer rawvideo -rawvideo w=1280:h=720 -loop 0 -fps 30
mplayer 2.yuv -demuxer rawvideo -rawvideo w=640:h=360 -loop 0 -fps 30
mplayer 3.yuv -demuxer rawvideo -rawvideo w=320:h=180 -loop 0 -fps 30

The encoding parameters can be modified in vp8_multi_resolution_encoder.c,
for example, target bitrate, frame rate...

Modified API. John helped a lot with that. Thanks!

Change-Id: I03be9a51167eddf94399f92d269599fb3f3d54f5
2011-12-05 17:59:42 -05:00
John Koleszar
6127af60c1 Merge "Speed selection support for disabled reference frames" 2011-12-05 14:36:54 -08:00
Yunqing Wang
06fc0f83b6 Populate q_index in multi-thread encoding
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.

Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
2011-11-28 15:58:28 -05:00
Scott LaVarnway
34d7c8b3d4 Added vp8_dequant_idct_add_y_block_sse2 setup
In Change I83202ffd, I deleted one too many lines.

Change-Id: If05d7c8988eb5c00898dc7c833ad7d99b5eb23e7
2011-11-28 13:06:13 -05:00
Scott LaVarnway
f46e17fd6f Merge "Modified the inverse walsh to output directly" 2011-11-28 07:26:07 -08:00
Scott LaVarnway
4a91541c94 Modified the inverse walsh to output directly
to the dqcoeff or qcoeff buffer.  The encoder would
populate the dc coeffs of the y blocks as a separate
stage (recon_dcblock) and the decoder would use a special
version of the idct.  This change eliminates the extra copy
and reduces the code footprint.

[Tero] Added needed changes to armv6 and NEON assembly.

Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba
2011-11-25 09:24:04 +02:00
Johann
e2bacd581a Merge "Move shared data to shared location" 2011-11-23 11:20:54 -08:00
Johann
15ea268d62 Merge "Fix encoder partitioned output on ARM" 2011-11-23 08:44:21 -08:00
Attila Nagy
97259b460c Fix encoder partitioned output on ARM
API was not returning correct partition sizes on arm targets.
The armv5 token packing functions were not storing the information to the
partition size table.
As a fix, have one boolcoder instance allocated for each partition so
that partition sizes are internally available after all partitions
were encoded. This will also allow more flexibility in producing
several partitions in parallel.

Use buffer validation (overflow check) in all ARM bitpacking
functions.

Change-Id: I31c8a11d8a7613676f0ff50928cb2a2ab14fd169
2011-11-23 12:29:43 +02:00
John Koleszar
b79879c2e3 Merge "Decoder fixes to better support reference picture selection." 2011-11-22 17:12:06 -08:00
Stefan Holmer
b5ee7b12d2 Decoder fixes to better support reference picture selection.
Change-Id: Id3388985d754706b9fd1f079c47121e79a63efdf
2011-11-21 10:25:21 +01:00
Johann
f2cd4ded22 Move shared data to shared location
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad

Moving towards allowing --disable-mmx

Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
2011-11-18 16:23:14 -08:00
John Koleszar
e55974bf86 Speed selection support for disabled reference frames
There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.

Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.

Change-Id: I1199149f8662a408537c653d2c021c7f1d29a700
2011-11-18 13:53:21 -08:00
Attila Nagy
c84d42f864 Validate encoder buffer writes for single token partition
Extend buffer write validation (overflow check) to single token
partition packing, both mb and row based functions.

Change-Id: I36e19b7d37fc43712d05c70e3ad223d3eb5b973d
2011-11-18 12:49:27 +02:00
Scott LaVarnway
3c755577b8 Merge "Added predictor stride argument(s) to subtract functions" 2011-11-17 10:17:53 -08:00
Scott LaVarnway
edd98b7310 Added predictor stride argument(s) to subtract functions
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix

[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
             Added also minor NEON optimizations to subtract
             functions.

Patch set 5: x86 stride bug fix

Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
2011-11-15 12:53:01 -05:00
John Koleszar
bdd35c13cc avoid resetting framerate during vpx_codec_enc_config_set()
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().

Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465
2011-11-11 14:45:58 -08:00
Scott LaVarnway
df49c7c58d SSE2 optimizations for vp8_build_intra_predictors_mby{,_s}()
Ronald recently sent me this patch that he did in April.
> From: Ronald S. Bultje <rbultje@google.com>
> Date: Thu, 28 Apr 2011 17:30:15 -0700
> Subject: [PATCH] SSE2 optimizations for
> vp8_build_intra_predictors_mby{,_s}().
HD decode tests have shown a performance boost up to 1.5%,
depending on material.
Patch set 3: Fixed encoder crash.

Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8
2011-11-09 15:30:35 -05:00
Scott LaVarnway
9532bda0fb Merge "Relocated idct/add calls for encoder" 2011-11-09 10:17:43 -08:00
Johann
ea2229bab6 Merge "ARMv6 optimized Intra4x4 prediction" 2011-11-09 09:36:33 -08:00
John Koleszar
2999ca3094 Merge "Reset FPU state after calc_plane_error()" 2011-11-09 09:35:08 -08:00
John Koleszar
3fcf0e3668 Merge "Compiler warning fix for const array." 2011-11-09 09:34:50 -08:00
John Koleszar
82e8884ad8 Merge "Remove unused file recon.c" 2011-11-09 09:31:23 -08:00
Scott LaVarnway
861ed6a5c1 Relocated idct/add calls for encoder
Call the idct/add after the tokenize.  This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.

Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756
2011-11-09 10:41:05 -05:00
Tero Rintaluoma
5a2fd63a2a ARMv6 optimized Intra4x4 prediction
Added ARM optimized intra 4x4 prediction
 - 2x faster on Profiler compared to C-code compiled with -O3
 - Function interface changed a little to improve BLOCKD structure
   access

Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
2011-11-09 09:13:51 +02:00
James Zern
9d60506130 threading: avoid defining _WIN32_WINNT
The referenced function (SignalObjectAndWait) isn't used. Reduces the
warnings with mingw32-w64 which defines this.

Change-Id: I4ce592879ec9372bf196dac640204c4d370bd210
2011-11-08 18:50:45 -08:00
John Koleszar
f89e109f56 Remove unused file recon.c
File not referenced from anywhere and no longer compiles.

Change-Id: I38b11bd60db615c2c2c9d7ad35caba3a1adf1750
2011-11-08 15:54:56 -08:00
Yunqing Wang
4c14efd234 Fix checks in MB quantizer initialization
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.

Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
2011-11-08 12:11:48 -05:00
Adrian Grange
b615a6d47f Third set of checks of buffer level against maximum buffer size
Additional check of buffer level to ensure it doesn't exceed the
maximum buffer size.

Change-Id: I1ba4f8b09bbec89646885040ff47470196af521e
2011-11-07 17:15:28 -08:00
Adrian Grange
fa25a31ed4 Additional clipping of buffer level to maximum buffer size
Added additional check of buffer level against maximum
buffer size.

Change-Id: Iaf1fbaf008601161e402b43ce82c3dbc129bf740
2011-11-07 16:54:40 -08:00
Adrian Grange
9dc95b0a12 Added check to make sure maximum buffer size not exceeded
Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.

This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.

As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).

Change-Id: Id8a3c6323d3246da50f7cb53ddbf78b5528032c6
2011-11-07 16:28:13 -08:00
James Zern
f89ea3432f fix file permissions
all of googletest import (0ab00a22) was marked executable

Change-Id: Id7b7ee03efc21ab998bb03349bd91644e8af25da
2011-11-04 18:50:35 -07:00
Fritz Koenig
f0c01413fb Compiler warning fix for const array.
Fix compiler warning for passing a non const array
to a function expecting a const array by using an
intermediary pointer and casting.

Change-Id: I9bdd358ebdc926223993fb8fb2098ffedd2f3fc7
2011-11-04 18:19:26 -07:00
Yunqing Wang
e1a55b504a Merge "Add checks in MB quantizer initialization" 2011-11-04 11:52:27 -07:00
Scott LaVarnway
44b5f76e34 Merge "Fix issue 374: eob read incorrectly" 2011-11-04 11:31:17 -07:00
John Koleszar
7ca6c91732 Merge "Changing decoder input partition API to input fragments." 2011-11-04 09:36:27 -07:00
Tero Rintaluoma
d497ec688d Fix issue 374: eob read incorrectly
Updated eob changes to check_reset_2nd_coeffs function.

Change-Id: Id1b21c91c7f0fd286640b487ffe47867009b717d
2011-11-04 09:36:49 +02:00
Scott LaVarnway
46639567a0 Merge "Change use of eob in the encoder" 2011-11-03 08:06:06 -07:00
Tero Rintaluoma
e4f2ec7a52 Change use of eob in the encoder
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.

Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
2011-11-03 16:08:09 +02:00
Yaowu Xu
8002c31804 Merge "added code to clear 2nd order block when appropriate" 2011-11-02 08:22:58 -07:00
John Koleszar
63bf108731 Merge "Fix: Increase default cx_data_size" 2011-11-01 15:25:47 -07:00
Stefan Holmer
1427205215 Changing decoder input partition API to input fragments.
Adding support for several partitions within one input fragment.
This is necessary to fully support all possible packetization
combinations in the VP8 RTP profile. Several partitions can
be transmitted in the same packet, and they can only be split
by reading the partition lengths from the bitstream.

Change-Id: If7d7ea331cc78cb7efd74c4a976b720c9a655463
2011-11-01 14:44:37 -07:00
Yunqing Wang
e44720af84 Add checks in MB quantizer initialization
In some situations (f.g. error-resilient is turned on), vp8cx_mb
_init_quantizer() was called once per macroblock. Added checks
to avoid calculations when there is no change.

Change-Id: Ie4f0a5ade2202041254990a4e9d5b03bd1ac5aea
2011-11-01 17:41:22 -04:00
John Koleszar
9bf3bc9a72 Correct SPLITMV clamping
Prior to this fix, the clamping state of the last subblock partition
dominated, whereas the correct behavior is to clamp if any partition
needs clamping. This bug was introduced by v0.9.6-232-g6b25501

See also:
  [1]: http://code.google.com/p/webm/issues/detail?id=371
  [2]: https://bugzilla.mozilla.org/show_bug.cgi?id=696390

Change-Id: I444db492b4c4f05f039c7da6f4216da8207dc138
2011-10-31 14:42:51 -07:00
Yaowu Xu
88e24f07ae added code to clear 2nd order block when appropriate
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.

Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.

As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.

It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.

Change-Id: I718e19cf59a4fc2462cb7070832759beb9f7e7dd
2011-10-28 12:07:21 -07:00
Scott LaVarnway
e0309e1509 Merge "Improved decode_split_mv()" 2011-10-28 09:27:17 -07:00
Johann
cd1ef53d12 Merge "Fix ARM build problem introduced by CL I3fab6f2b" 2011-10-27 11:17:54 -07:00
Scott LaVarnway
6064384d59 Improved decode_split_mv()
Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.

Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
2011-10-27 11:26:30 -04:00
Scott LaVarnway
0db5599957 Merge "Improved mv_bias" 2011-10-27 06:14:00 -07:00
Attila Nagy
9452dce181 Fix ARM build problem introduced by CL I3fab6f2b
Update ARM asm implementation of vp8_start_encode to new definition.

Change-Id: Ic44791c969e351082331ba6146c3384c01a0dfad
2011-10-27 09:06:45 +03:00
Johann
294777b915 Merge "Reduce partial frame copy in encoder's pick_filter_level_fast" 2011-10-26 11:33:14 -07:00
Scott LaVarnway
21970d1dc2 Improved mv_bias
Small performance gains.

Change-Id: I709b9390a8a27a70f5f23574313b8db85ac7f23d
2011-10-26 11:46:10 -04:00
Scott LaVarnway
efa69d26a1 Merge "Improved read_mb_modes_mv()" 2011-10-26 08:26:30 -07:00
Scott LaVarnway
ff1d170e69 Improved read_mb_modes_mv()
Interleaved vp8_find_near_mvs and vp8_mv_ref_probs.
2.5% to 4% performance improvement for the HD clips used.

Change-Id: Id888b667cf5ae2f0e19da18743140f055ff7de8d
2011-10-26 10:46:36 -04:00
Attila Nagy
de82809444 Reduce partial frame copy in encoder's pick_filter_level_fast
The partial frame copy function used to copy an extra 8 lines above
and  below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.

Define the "magic" fraction number for partial filtering in
loopfilter.h .

Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
2011-10-26 15:25:07 +03:00
Johann
f9dba66877 Merge "remove uninitialized variable warning" 2011-10-25 14:42:21 -07:00
Scott LaVarnway
e03330bd80 Merge "Improved token decoder" 2011-10-25 10:04:11 -07:00
Johann
9409af2083 remove uninitialized variable warning
Restructure if statement to clarify the error condition. Trigger the
error before clobbering pc-> variables.

Change-Id: Id01cab798a341ce9899078fdcec265a0e942a0b7
2011-10-25 09:24:40 -07:00
Scott LaVarnway
3579baa115 Merge "Removed read_mv_ref" 2011-10-25 08:01:32 -07:00
Johann
a82cc0205d remove unused variable warning
Change-Id: I4fcd6e4656d9823aead941616cd63501aecbd6e2
2011-10-24 16:33:45 -07:00
Scott LaVarnway
49ea2bc3f4 Removed read_mv_ref
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure.  This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.

Change-Id: I1e798d6ec40fcaeeff06ccc82f81201978d12f74
2011-10-24 16:16:08 -04:00
Scott LaVarnway
f182376dd6 Moved the split motion vector decode
into a function.

Change-Id: Ia023a0587100a52cb084f5d9d5512efa6198dad3
2011-10-24 13:52:15 -04:00
Scott LaVarnway
231339932b Merge "Removed redundant mv clamps for nearmv and nearestmv" 2011-10-24 10:27:53 -07:00
James Berry
bac6c229e5 Fix: Increase default cx_data_size
Prior size could be too small in some instances
resulting in an error.

Change-Id: Ic601e49cbae92c98a0e7fb51ba8c186b352ffba6
2011-10-24 11:50:27 -04:00
Scott LaVarnway
a99c20c0f4 Removed redundant mv clamps for nearmv and nearestmv
Did some cleanup as well.

Patchset 2:  Fixed bug.  Will revisit the segmentation logic.

Change-Id: Idf9fbcff9aaf467bdace9fbd58ef2cea6c602049
2011-10-24 11:37:52 -04:00
Scott LaVarnway
e59d53e999 Merge "Remove unused DETOK structure" 2011-10-21 06:30:57 -07:00
Tero Rintaluoma
bdb4fb8991 Remove unused DETOK structure
DETOK structure is not used anymore.

Change-Id: Id22e1af78fb85d4bb151237a60290d9364faf217
2011-10-21 09:33:49 +03:00
John Koleszar
2c0b4a24b9 Merge "Fix: check cx_data buffer prior to write" 2011-10-20 17:36:40 -07:00
James Berry
bc7151131d Fix: check cx_data buffer prior to write
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.

Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
2011-10-20 15:55:00 -04:00
Johann
7cdc986cdf Don't copy borders for loop_filter_pick
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.

Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
2011-10-19 18:54:14 -07:00
Johann
f382173225 Merge "enc: save entropy probs only when needed for refresh" 2011-10-19 14:36:29 -07:00
Scott LaVarnway
5e54085703 Improved token decoder
Tests showed over 2% improvement on various HD clips.

Change-Id: I94a30d209c92cbd5fef285122f9fc570688635fe
2011-10-19 13:38:35 -04:00
Scott LaVarnway
63a77cbed9 Merge "Remove usage of predict buffer for decode" 2011-10-19 10:24:48 -07:00
Scott LaVarnway
ed9c66f584 Remove usage of predict buffer for decode
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer.  For blocks with eob=0,
unnecessary idcts can be eliminated.  This gave a performance
boost of ~1.8% for the HD clips used.

Tero: Added needed changes to ARM side and scheduled some
      assembly code to prevent interlocks.

Patch Set 6:  Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.

Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
2011-10-18 12:06:50 -04:00
Attila Nagy
a5cd42feb9 Fix: vp8cx_pack_tokens_into_partitions_armv5 crash
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.

Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
2011-10-14 10:53:04 +03:00
Adrian Grange
04182a121a Merge "Added rate-targeted temporal scalability" 2011-10-11 12:54:52 -07:00
Adrian Grange
217591fde5 Added rate-targeted temporal scalability
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.

The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)

Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
2011-10-11 12:49:12 -07:00
John Koleszar
07ba411914 Reset FPU state after calc_plane_error()
Fixes a MMX/SSE2 mismatch when building with --enable-internal-stats.

Change-Id: I0c50a1f246f6916b7a5fc6f36864ceb362f25520
2011-10-11 08:43:30 -07:00
James Berry
05bde9d4a4 bug fix - starting/optimal/max and buffer_level changed from int to int64_t
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.

Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
2011-10-10 12:16:55 -04:00
Attila Nagy
c0de35b413 enc: save entropy probs only when needed for refresh
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.

Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
2011-10-10 13:44:54 +03:00
Scott LaVarnway
af12c23e8e Merge "Improved tokenize" 2011-10-04 09:57:42 -07:00
John Koleszar
8f8b526b54 Merge "Fix uninitialized new_mv_count in first pass file" 2011-10-04 07:40:49 -07:00
Yunqing Wang
538865dfa5 Merge "Multithreaded encoder, late sync loopfilter" 2011-10-04 07:04:30 -07:00
John Koleszar
86712c50f2 Fix uninitialized new_mv_count in first pass file
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.

Also fix a number of compiler warnings.

Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
2011-10-04 09:50:52 -04:00
Johann
2aa408524c Merge "Reduce computational complexity of generic C loop filter." 2011-09-30 16:17:56 -07:00
Johann
48b1917112 Merge "combine loopfilter data access" 2011-09-30 15:47:56 -07:00
Scott LaVarnway
ab00d209bc Improved tokenize
For a realtime HD encodings, up to 1.6% gains seen.



Change-Id: If45028e23db95124da63f9d38ffe06e05596cc6e
2011-09-30 12:49:46 -04:00
Johann
3556deaca3 combine loopfilter data access
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).

This implementation is x86_64 only. We retain the previous
implementation for x86.

Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.

Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
2011-09-30 07:38:35 -07:00
Alpha Lam
7bce513afe Call vp8_find_near_mvs lazily
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.

Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.

Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
2011-09-30 14:48:18 +01:00
Paul Wilkins
a572ac8327 Merge "CQ and two pass rate control." 2011-09-30 02:57:54 -07:00
Paul Wilkins
b6e27d5f0b CQ and two pass rate control.
Changes to the selection of Q limits for two pass
and two pass CQ mode.

Allowance made for Mode and motion vector costs.
Some refactoring of common code.

For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).

Some increased tendency to undershoot even when
user CQ not reached.

Patch2: Removed some test code accidentally merged.

Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
2011-09-30 10:55:52 +01:00
Aaron Watry
69aa303d96 Reduce computational complexity of generic C loop filter.
Change-Id: I1e7f9ed3cd907844a495b9e0073bc140b87e5c06
2011-09-29 17:25:48 -05:00
Attila Nagy
380d64ecb1 Multithreaded encoder, late sync loopfilter
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.

Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066
2011-09-29 10:06:24 +03:00
John Koleszar
6f9457ec12 Merge "clamp_mvs() using the wrong motion vector information" 2011-09-22 11:54:15 -07:00
John Koleszar
3c85c532bb Merge changes Ie650e9b8,I2427e494
* changes:
  vpxenc: get version string programatically
  Install missing default_coef_probs.h
2011-09-22 11:18:00 -07:00
Johann
9f41a8b0aa Merge "Replace vpx_ports/config.h with vpx_config.h" 2011-09-22 09:30:18 -07:00
John Koleszar
4a6ac727fe Install missing default_coef_probs.h
Make sure that this header is listed as one of the sources, so that it
will be installed if necessary.

Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
2011-09-22 11:08:24 -04:00
Attila Nagy
1a7d25a484 Replace vpx_ports/config.h with vpx_config.h
Just a clean-up.

Change-Id: Iea5b6dc925dcfa7db548bc1ab1a13d26ed5a2c9a
2011-09-22 13:33:54 +03:00
Fritz Koenig
bd0c3409a8 Move neon only arm functions under arm/neon.
These files don't contain generic arm code, so should
only be compiled by neon.

Change-Id: Ie712823aa04d4235e7cfe7a3b725e73ee4c3e564
2011-09-20 10:51:06 -07:00
Johann
6829e62718 Merge "NEON FDCT updated to match current C code" 2011-09-20 09:51:05 -07:00
Johann
86e07525d5 Merge "NEON walsh transform updated to match C" 2011-09-20 09:50:42 -07:00
Johann
3a16276cf7 Merge "Updated ARMv6 forward transforms to match C" 2011-09-20 09:50:36 -07:00
Johann
fdd51829b1 Merge "Fixed armv5te multiplications" 2011-09-20 09:50:19 -07:00
Tero Rintaluoma
0c2529a812 NEON FDCT updated to match current C code
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9

Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
2011-09-20 10:20:55 +03:00
Tero Rintaluoma
3c19bc3fb3 Fixed armv5te multiplications
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.

Restriction applies only to armv5 targets and not for armv6 and above.

Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
2011-09-20 09:59:27 +03:00
Stefan Holmer
e529a825f7 Fix necessary for input partitions iface to match the RTP profile
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.

Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.

Change-Id: I6f93b15287d291169ed681898ed3fbcc5dc81837
2011-09-19 15:00:21 +02:00
Tero Rintaluoma
4c3ad66b7f Updated ARMv6 forward transforms to match C
- Updated walsh transform to match C
  (based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
  correspondingly

Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
2011-09-19 10:26:59 +03:00
Tero Rintaluoma
2a4b2a000c NEON walsh transform updated to match C
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.

Change-Id: I338f02b881098782f82af63d97f042b85e63e902
2011-09-19 10:15:33 +03:00
John Koleszar
35ce4eb01d Merge "Fixes the boundary checks for extrapolated and interpolated MVs." 2011-09-16 08:09:44 -07:00
Scott LaVarnway
c0ee870b0a clamp_mvs() using the wrong motion vector information
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated.  The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values.  This
patch fixes this problem.

Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff
2011-09-16 11:03:53 -04:00
Stefan Holmer
b854bbd844 Fixes the boundary checks for extrapolated and interpolated MVs.
Change-Id: I5b47d39d1604f2650d2f2d1ca2a3f40843c8e1ea
2011-09-16 11:58:57 +02:00
Scott LaVarnway
5bc7b3a68e Fixed encoder crash
caused by the "Removed bmi copy to/from BLOCKD" commit.

Change-Id: I9fae71bdc34c8ecc07bb81cd3ccf498b91ce3ec7
2011-09-13 11:46:33 -04:00
Scott LaVarnway
c4b9089bb9 Merge "Skip computation of distortion in vp8_pick_inter_mode if active_map is used" 2011-08-31 07:18:52 -07:00
Scott LaVarnway
222c72e50f Merge "Removed bmi copy to/from BLOCKD" 2011-08-31 06:57:20 -07:00
Alpha Lam
0e05f2c6c9 Skip computation of distortion in vp8_pick_inter_mode if active_map is used
If a block is marked to be inactive then set distortion to 0.

Change-Id: Ib415f19642a2ff7b5cf5cfaedd60ebbd79732272
2011-08-31 14:06:55 +01:00
John Koleszar
800b70a3bf Merge "Recalculate zbin_extra only if regular quantizer is being used" 2011-08-30 12:49:24 -07:00
Alpha Lam
bc9293b815 Recalculate zbin_extra only if regular quantizer is being used
vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.

Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db
2011-08-30 19:23:34 +01:00
Yunqing Wang
1f20202e2c Minor modification on key frame decision
This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.

Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2
2011-08-25 16:54:45 -04:00
Fritz Koenig
4797a97215 Quiet warning by removing unused variable.
fwd_boost_score was not being computed or
referenced, so remove declaration.

Change-Id: Iece36cde1ec113e3c6afaff1407d24cdf12bd0a8
2011-08-24 15:47:09 -07:00
Scott LaVarnway
b870947d42 Removed bmi copy to/from BLOCKD
for SPLITMV and B_PRED modes.  Modified code to use the bmi
found in mode_info_context instead of BLOCKD.  On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock.  This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)

Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
2011-08-24 14:42:26 -04:00
Fritz Koenig
112bd4e2b4 Fix naming of sse2 idct functions.
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.

Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
2011-08-24 10:25:32 -07:00
Scott LaVarnway
1de5da80c9 Merge "Faster vp8_default_coef_probs" 2011-08-24 07:52:10 -07:00
Johann
85358d04cd Fix data accesses for simple loopfilters
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.

For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).

This shows a small improvement on older processors which have a
significant penalty for unaligned reads.

postproc_mmx.c is unused

Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
2011-08-23 20:42:45 -04:00
Fritz Koenig
c5f890af2c Use local labels for jumps/loops in x86 assembly.
Prepend . to local labels in assembly code.  This
allows non unique labels within a file.  Also
makes profiling information more informative
by keeping the function name with the loop name.

Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
2011-08-23 09:05:29 -07:00
Fritz Koenig
694d4e7777 Reclassify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3.  Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.

Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
2011-08-22 12:36:28 -07:00
Fritz Koenig
b7a6f1d20e Merge "Revert "Reclasify optimized ssim calculations as SSE2."" 2011-08-22 12:32:12 -07:00
Fritz Koenig
734b1b2041 Revert "Reclasify optimized ssim calculations as SSE2."
This reverts commit 01376858cd
2011-08-22 11:31:12 -07:00
Fritz Koenig
f8e3d23b99 Merge "Reclasify optimized ssim calculations as SSE2." 2011-08-22 09:20:33 -07:00
Fritz Koenig
01376858cd Reclasify optimized ssim calculations as SSE2.
Calculations were incorrectly classified as either
SSE3 or SSSE3.  Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.

Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
2011-08-19 08:51:27 -07:00
John Koleszar
edec5eb5e7 Merge "Copy less when active map is in use" 2011-08-19 07:31:00 -07:00
Alpha Lam
4e8d35a461 Copy less when active map is in use
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.

This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.

Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
2011-08-19 10:29:41 -04:00
Paul Wilkins
744f482350 Small boost to every other frame.
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.

Also use min GF interval as min KF interval.

Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485
2011-08-17 14:14:23 +01:00
Scott LaVarnway
19987dcbfa Faster vp8_default_coef_probs
Copies from a generated table instead of building the
default coeff probabilities during runtime.

Change-Id: I4d9551ea3a2d7d4a4f7ce9eda006495221a8de50
2011-08-16 16:21:21 -04:00
John Koleszar
9cc1611588 Merge v0.9.7-p1 release int 'origin/master'
Change-Id: I93388d2f8846615ad1e26b975308c5e96b9b1918
2011-08-15 17:10:01 -04:00
Stefan Holmer
99d870a472 Don't set the bmi mode when doing error concealment
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.

Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
2011-08-15 11:46:04 -04:00
Stefan Holmer
ff35649758 Don't set the bmi mode when doing error concealment
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.

Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
2011-08-15 09:56:07 +02:00
John Koleszar
e96131705a Revert "Improved 1-pass CBR rate control"
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.

Conflicts:

	vp8/encoder/onyx_if.c
	vp8/encoder/ratectrl.c

Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
2011-08-12 14:51:36 -04:00
John Koleszar
a4c2211ea3 Propagate macroblock MV to subblocks for error concealment
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.

Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
2011-08-12 14:49:35 -04:00
Stefan Holmer
a609be5633 Disable error concealment until first key frame is decoded
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.

Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.

Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
2011-08-12 14:49:34 -04:00
John Koleszar
cdae03a4eb Fix potential OOB read with Error Concealment
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.

Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
2011-08-12 14:49:34 -04:00
John Koleszar
4645c89889 Merge "Disable error concealment until first key frame is decoded" 2011-08-12 11:45:26 -07:00
John Koleszar
91206793c2 Propagate macroblock MV to subblocks for error concealment
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.

Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
2011-08-12 11:34:40 -04:00
Stefan Holmer
3e10be93f2 Disable error concealment until first key frame is decoded
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.

Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.

Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
2011-08-12 16:10:04 +02:00
John Koleszar
810a06b12c Fix potential OOB read with Error Concealment
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.

Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
2011-08-11 18:07:03 -04:00
Yunqing Wang
b84e8f20c3 Merge "Adjust half-pixel only search" 2011-08-05 12:15:32 -07:00
John Koleszar
238dae8604 Fix source buffer selection
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.

Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
2011-08-03 16:13:15 -04:00
Yunqing Wang
b9f19f8917 Adjust half-pixel only search
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.

Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
2011-08-03 11:51:07 -04:00
Johann
30e5deae5d update extend frame borders
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676

update the code with new assumptions and guard them with a compile time
assert

Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
2011-08-02 19:26:46 -04:00
James Berry
27ee521753 include asm_com/dec_offsets for make dist
Change-Id: Ia1ad66066a24c01915cd9e3ff75c7e070cc984c8
2011-08-02 13:42:03 -04:00
John Koleszar
f475f0c1bb Merge "include the arm header files in make dist" into cayuga 2011-08-02 05:21:10 -07:00
John Koleszar
06c3d5bb9a Fix building with --disable-postproc
Change-Id: I7e6bc28e7974a376da747300744e0dd5dc1d21e9
2011-08-01 17:50:23 -04:00
Johann
3e8c6d3d35 include the arm header files in make dist
Change-Id: Ibcf5b4b14153f65ce1b53c3bfba87ad2feb17bbd
2011-08-01 17:20:21 -04:00
John Koleszar
6f080f9cec Merge "Convert rc_max_intra_bitrate_pct to control" 2011-07-29 11:57:48 -07:00
John Koleszar
1f71d2e2c8 Correctly track sharpness in vp8cx_pick_filter_level_fast
Make sure to update last_sharpness_level from the current
sharpness_level whenever it changes.

Change-Id: I0258d2f5b11a407abf6176a8d4c4994d925943f0
2011-07-29 12:27:03 -04:00
John Koleszar
1654ae9a2a Convert rc_max_intra_bitrate_pct to control
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.

More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.

Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
2011-07-28 09:17:35 -04:00
Yunqing Wang
2f2302f8d5 Preload reference area in sub-pixel motion search (real-time mode)
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):

1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain

2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.

Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
2011-07-27 14:19:10 -04:00
Yunqing Wang
f11613b620 Merge "Fix range checks in motion search" 2011-07-27 09:34:13 -07:00
Yunqing Wang
bde2afbe23 Fix range checks in motion search
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.

Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
2011-07-27 10:37:33 -04:00
John Koleszar
db8f0d2ca9 Merge "cosmetics: consistently use [u]int64_t" 2011-07-26 12:57:43 -07:00
James Zern
b45065d38b cosmetics: consistently use [u]int64_t
Removes mixed usage of (unsigned) long long and INT64.
Fixes Issue #208.

Change-Id: I220d3ed5ce4bb1280cd38bb3715f208ce23cf83a
2011-07-26 11:34:36 -07:00
Johann
ca7e346669 Merge ""Eliminated TOKENEXTRABITS" broke the windows build." 2011-07-26 06:34:31 -07:00
Scott LaVarnway
a11624497c "Eliminated TOKENEXTRABITS" broke the windows build.
Fixed.

Change-Id: I3348e8dbcaee6ace263af413701101d77636e5df
2011-07-26 09:33:16 -04:00
Scott LaVarnway
4894b45ced Merge "Eliminated TOKENEXTRABITS" 2011-07-25 14:35:58 -07:00
Scott LaVarnway
76eb402668 Eliminated TOKENEXTRABITS
Noticed small performance gains, depending on material.

Change-Id: I334369f6312bc19aa73481fc3f790ab181e11867
2011-07-25 17:11:24 -04:00
Yunqing Wang
5b0de48ddd Merge "Use CONFIG_FAST_UNALIGNED consistently in codec" 2011-07-25 12:40:50 -07:00
Yunqing Wang
fe270dd527 Specify size for argument pushed to stack
The change fixes building error on Win64.

Change-Id: I63d25b26220c4da8a98ca2e36530cbb802468e6b
2011-07-25 11:30:45 -04:00
Yunqing Wang
65dfcf4696 Use CONFIG_FAST_UNALIGNED consistently in codec
CONFIG_FAST_UNALIGNED is enabled by default. Disable it if it is
not supported by hardware.

Change-Id: I7d6905ed79fed918bca074bd62820b0c929d81ab
2011-07-25 10:11:24 -04:00
Johann
773bcc300d Merge "fix sharpness bug and clean up" 2011-07-22 09:34:55 -07:00
Johann
a04ed0e8f3 fix sharpness bug and clean up
sharpness was not recalculated in vp8cx_pick_filter_level_fast

remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.

always use cm->sharpness_level. the extra indirection was annoying.

don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init

move function declarations to their proper header

Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
2011-07-22 12:33:57 -04:00
Yunqing Wang
829179e888 Merge "Preload reference area to an intermediate buffer in sub-pixel motion search" 2011-07-22 06:56:15 -07:00
Yunqing Wang
20bd1446c0 Preload reference area to an intermediate buffer in sub-pixel motion search
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
2011-07-22 09:28:06 -04:00
John Koleszar
2bdda84e37 Merge "Increase chrow row alignment to 16 bytes." 2011-07-21 07:32:39 -07:00
Yunqing Wang
c5fe641179 Merge "Add improvements made in good-quality mode to real-time mode" 2011-07-21 07:27:09 -07:00
Timothy B. Terriberry
7d1b37cdac Increase chrow row alignment to 16 bytes.
This is done by expanding luma row to 32-byte alignment, since
 there is currently a bunch of code that assumes that
 uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
 common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
 encoder/temporal_filter.c, and possibly others; I haven't done a
 full audit).
It also uses replaces the hardcoded border of 16 in a number of
 encoder buffers with VP8BORDERINPIXELS (currently 32), as the
 chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
 dumping the frame memory as a contiguous blob produces a valid,
 if padded, image.

Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
2011-07-20 10:20:31 -07:00
Attila Nagy
0afcc76971 encoder: don't set the fragment bit for the last partition
Change-Id: Icb4e4f0d7c3074a8507852178be87541a1cb5bac
2011-07-20 14:09:42 +03:00
Scott LaVarnway
b2d9700f53 Merge "Moved vp8_encode_bool into boolhuff.h" 2011-07-19 08:15:14 -07:00
Johann
6afafc313c remove old armv5 code
armv5 dequantizer is not referenced

Change-Id: Id1cc617dcee35ebd6a406816ec6aaa26e8bbc8ad
2011-07-19 09:20:38 -04:00
Scott LaVarnway
a25f6a9c88 Moved vp8_encode_bool into boolhuff.h
allowing the compiler to inline this function.  For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.

Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
2011-07-19 09:17:25 -04:00
John Koleszar
b5ea2fbc2c Improved 1-pass CBR rate control
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.

Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.

Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
2011-07-18 11:48:05 -04:00
Scott LaVarnway
e68894fa03 Merge "Tokenize MB optimized" 2011-07-15 07:54:14 -07:00
Tero Rintaluoma
4e82f01547 Tokenize MB optimized
Optimized C-code of the following functions:
 - vp8_tokenize_mb
 - tokenize1st_order_b
 - tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.

Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
2011-07-15 11:26:54 +03:00
James Berry
6b6f367c3d bug fix vpx_copy_and_extend_frame size issue
vpx_copy_and_extend_frame could incorrectly
resize uv frames which could result in a crash.

Change-Id: Ie96f7078b1e328b3907a06eebeee44ca39a2e898
2011-07-14 15:58:15 -04:00
John Koleszar
04dce631a2 Remove unused speed features
min_fs_radius, max_fs_radius, full_freq were set but never read.

Change-Id: I82657f4e7f2ba2acc3cbc3faa5ec0de5b9c6ec74
2011-07-14 14:20:25 -04:00
Yunqing Wang
f1f28535c3 Merge "Fix unnecessary casting of B_PREDICTION_MODE (issue 349)" 2011-07-13 13:32:57 -07:00
Yunqing Wang
139577f937 Fix unnecessary casting of B_PREDICTION_MODE (issue 349)
Minor fix.

Change-Id: Iaf93f6e47e882a33c479e57c7a0d0bf321e291c0
2011-07-13 15:52:07 -04:00
Yunqing Wang
0e9a6ed72a Add improvements made in good-quality mode to real-time mode
Several improvements we made in good-quality mode can be added
into real-time mode to speed up encoding in speed 1, 2, and 3
with small quality loss. Tests using tulip clip showed:

--rt --cpu-used=-1
(before change)
PSNR: 38.028
time: 1m33.195s
(after change)
PSNR: 38.014
time: 1m20.851s

--rt --cpu-used=-2
(before change)
PSNR: 37.773
time: 0m57.650s
(after change)
PSNR: 37.759
time: 0m54.594s

--rt --cpu-used=-3
(before change)
PSNR: 37.392
time: 0m42.865s
(after change)
PSNR: 37.375
time: 0m41.949s

Change-Id: I76ab2a38d72bc5efc91f6fe20d332c472f6510c9
2011-07-13 14:51:02 -04:00
Fritz Koenig
84c3cd79d1 Merge "Reduce motion vector search on alt-ref frame." 2011-07-13 10:07:30 -07:00
Johann
211694f67e Merge "update x86 asm for loopfilter" 2011-07-13 04:10:03 -07:00
Johann
8f910594bd Merge "Update armv6 loopfilter to new interface" 2011-07-13 04:09:55 -07:00
Johann
1a219c22b1 Merge "Update armv7 loopfilter to new interface" 2011-07-13 04:09:42 -07:00
Johann
d9b825cff2 Merge "New loop filter interface" 2011-07-13 04:09:26 -07:00
Attila Nagy
c231b0175d Update armv6 loopfilter to new interface
Change-Id: I5fe581d797571a7a9432fbd17fc557591d0c1afa
2011-07-12 12:14:51 +03:00
Attila Nagy
283b0e25ac Update armv7 loopfilter to new interface
Change-Id: I65105a9c63832669237e6a6a7fcb4ea3ea683346
2011-07-12 12:12:25 +03:00
Fritz Koenig
ede0b15c9d Reduce motion vector search on alt-ref frame.
Clamp mv search to accomodate subpixel filtering
of UV mv.

Change-Id: Iab3ed405993ef6bf779ad7cf60863153068fb7d1
2011-07-11 09:05:43 -07:00
Yunqing Wang
587ca06da9 Minor change in pick_inter_mode()
Scott suggested to move vp8_mv_pred() under "case NEWMV" to save
extra checks.

Change-Id: I09e69892f34a08dd425a4d81cfcc83674e344a20
2011-07-08 14:08:45 -04:00
Yunqing Wang
e83d36c053 Merge "Adjust full-pixel clamping and motion vector limit calculation" 2011-07-08 08:39:32 -07:00
Yunqing Wang
40991faeae Adjust full-pixel clamping and motion vector limit calculation
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b748045470

Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
2011-07-08 11:34:28 -04:00
Johann
01433c5043 update x86 asm for loopfilter
Change-Id: I1ed739522db7c00c189851c7095c1b64ef6412ce
2011-07-08 09:23:38 -04:00
Johann
6ae12c415e Merge "clean up warnings when building arm with rtcd" 2011-07-08 05:16:09 -07:00
Attila Nagy
622958449b New loop filter interface
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.

Change works only with --target=generic-gnu
All other targets have to be updated!

Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
2011-07-08 09:31:41 +03:00
John Koleszar
973a9c075d Merge "Set VPX_FRAME_IS_DROPPABLE" 2011-07-07 08:11:05 -07:00
John Koleszar
37de0b8bdf Set VPX_FRAME_IS_DROPPABLE
Allow the encoder to inform the application that the encoded frame will not
be used as a reference.

Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
2011-07-07 10:38:45 -04:00
John Koleszar
b4f70084cc Merge "Properly use GET_GOT/RESTORE_GOT when using GLOBAL()." 2011-07-01 07:14:34 -07:00
Ronald S. Bultje
c8a23ad3f4 Properly use GET_GOT/RESTORE_GOT when using GLOBAL().
This should fix binaries using PIC on x86-32. Also should
fix issue 343.

Change-Id: I591de3ad68c8a8bb16054bd8f987a75b4e2bad02
2011-06-30 14:04:27 -07:00
Yunqing Wang
ae8aa836d5 Merge "Copy macroblock data to a buffer before encoding it" 2011-06-30 11:14:24 -07:00
Yunqing Wang
80c3bbf657 Merge "Bug fix in motion vector limit calculation" 2011-06-30 09:52:03 -07:00
Yunqing Wang
b748045470 Bug fix in motion vector limit calculation
Motion vector limits are calculated using right shifts, which
could give wrong results for negative numbers. James Berry's
test on one clip showed encoder produced some artifacts. This
change fixed that.

Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
2011-06-30 11:20:13 -04:00
Johann
3e4a80cc35 Merge "remove incorrect initialization" 2011-06-30 07:59:08 -07:00
Paul Wilkins
eacaabc592 Merge "Change to arf boost calculation." 2011-06-29 10:03:57 -07:00
Paul Wilkins
11694aab66 Change to arf boost calculation.
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.

The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.

The new code can be deactivated using #define NEW_BOOST 0

In its current default state the code searches forwards and backwards
from the proposed  position of the next alt ref.

The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.

I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.

Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
2011-06-29 18:01:25 +01:00
Johann
fe53107fda remove incorrect initialization
Values were set, then reset. Only set them once.

Change-Id: Iaf43c8467129f2f261f04fa9188b603aa46216b5
2011-06-29 11:54:27 -04:00
Johann
6611f66978 clean up warnings when building arm with rtcd
Change-Id: I3683cb87e9cb7c36fc22c1d70f0799c7c46a21df
2011-06-29 10:51:41 -04:00
John Koleszar
f3a13cb236 Merge "Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently" 2011-06-29 07:29:59 -07:00
Johann
dc004e8c17 Merge "Avoid text relocations in ARM vp8 decoder" 2011-06-28 16:34:10 -07:00
Johann
02c30cdeef Merge "utilize preload in ARMv6 MC/LPF/Copy routines" 2011-06-28 16:33:45 -07:00
John Koleszar
b32da7c3da Use MAX_ENTROPY_TOKENS and ENTROPY_NODES more consistently
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.

Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
2011-06-28 17:03:55 -04:00
John Koleszar
9bcf07ae4a Merge "Simplify decode_macroblock." 2011-06-28 12:54:25 -07:00
Gaute Strokkenes
81c0546407 Simplify decode_macroblock.
Change-Id: Ieb2f3827ae7896ae594203b702b3e8fa8fb63d37
2011-06-28 17:01:14 +01:00
Stefan Holmer
7296b3f922 New ways of passing encoded data between encoder and decoder.
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.

At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.

At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.

Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.

The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.

Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
2011-06-28 11:10:17 -04:00
Stefan Holmer
4cb0ebe5b2 Adding support for independent partitions
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.

Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
2011-06-28 11:10:17 -04:00
Mike Hommey
e3f850ee05 Avoid text relocations in ARM vp8 decoder
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.

Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.

Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
2011-06-28 09:11:40 +02:00
Fritz Koenig
be99868bd1 Fix after removal of B_MODE_INFO
Change Ieb746989: Removed B_MODE_INFO missed this.

Change-Id: I32202555581cc2a5d45e729c6650ada4d2df55d3
2011-06-27 09:43:21 -07:00
Johann
8a9a11e8dc Merge "configuration, support disabling any subset of ARM arch" 2011-06-27 08:55:18 -07:00
Stefan Holmer
ba0822ba96 Adding support for error concealment in multi-threaded decoding
Also includes a couple of error concealment bug fixes:
- the segment_id wasn't properly initialized when missing
- when interpolating and no neighbors are found, set to zero
- clear the qcoef buffer when concealing an MB

Change-Id: Id79c876b41d78b559a2241e9cd0fd2cae6198f49
2011-06-27 09:03:06 -04:00
Adrian Grange
deca8cfc44 Fixed initialization of frame buffer ref counters
Only the first frame buffer ref counter was being initialized
because the index was fixed at 0 rather than using i.

Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4
2011-06-24 08:43:40 -07:00
Yunqing Wang
0d87098e08 Copy macroblock data to a buffer before encoding it
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.

Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
2011-06-23 13:54:02 -04:00
John Koleszar
db67dcba6a Revert "Reduce overshoot in 1 pass rate control"
This reverts commit 212f618373.

Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.

Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5
2011-06-23 11:52:12 -04:00
James Berry
2bd90c13a0 get/set reference buffer dimension check added
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.

Change-Id: I904982313ce9352474f80de842013dcd89f48685
2011-06-22 13:36:24 -04:00
Yaowu Xu
76495617e0 Merge "adjusting the calculation of errorperbit" 2011-06-21 09:47:42 -07:00
Scott LaVarnway
55c3963c88 Merge "Improved vp8dx_decode_bool" 2011-06-21 07:45:51 -07:00
Yunqing Wang
109c20299c Merge "Remove unnecessary bounds checking in motion search" 2011-06-21 07:23:24 -07:00
Attila Nagy
6f23f24afe configuration, support disabling any subset of ARM arch
Useful for leaving out any version specific asm files.

Change-Id: I233514410eb9d7ca88d2d2c839673122c507fa99
2011-06-21 10:39:01 +03:00
Yaowu Xu
10ed60dc71 adjusting the calculation of errorperbit
RDMULT/RDDIV defines a bit worth of distortion in term of sum squared
difference. This has also been used as errorperbit in subpixel motion
search, where the distortions computed as variance of the difference.
The variance of differences is different from sum squared differences
by amount of DC squared. Typically, for inter predicted MBs, this
difference averages around 10% between the two distortion, so this patch
introduces a 110% constant in deriving errorperbit from RDMULT/RDDIV.

Test on CIF set shows small but positive gain on overall PSNR (.03%)
and SSIM (.07%), overall impact on average PSNR is 0.

Change-Id: I95425f922d037b4d96083064a10c7cdd4948ee62
2011-06-20 16:32:30 -07:00
Scott LaVarnway
67a1f98c2c Improved vp8dx_decode_bool
Relocated the vp8dx_bool_decoder_fill() call, allowing
the compiler to produce better assembly code.  Tests
showed a 1 - 2 % performance boost (x86 using gcc)
for the 720p clip used.

Change-Id: Ic5a4eefed8777e6eefa007d4f12dfc7e64482732
2011-06-20 14:44:16 -04:00
Taekhyun Kim
458fb8f491 utilize preload in ARMv6 MC/LPF/Copy routines
About 9~10% decoding perf improvement on non-Neon ARM cpus

Change-Id: I7dc2a026764e84e9c2faf282b4ae113090326837
2011-06-17 14:04:53 -07:00
Yunqing Wang
2cd1c2855e Remove unnecessary bounds checking in motion search
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.

Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
2011-06-17 14:19:51 -04:00
John Koleszar
a60fc419f5 Merge "Use SSE as BPRED distortion metric consistently" 2011-06-17 09:48:32 -07:00
Ronald S. Bultje
87fd66bb0e Assign boost to GF bit allocation if past frame had no ARF.
Modify the second-pass code to provide a full golden-frame (GF) bit
allocation boost if the past GF group (GFG) had no alt-ref frame (ARF),
even if the current GFG does contain and ARF.

This mostly has no effect on clips, since switching ARFs on/off between
GFGs is not very common. Has a positive effect on e.g. cheer (+0.45 SSIM
at 600kbps) and football (+0.25 SSIM at 600kbps), particularly at high
bitrates. Has a negative effect (-0.04 SSIM at 300kbps) at pamphlet,
which appears only marginally related to this patch, and crew (-0.1 SSIM
at 700kbps).

Change-Id: I2e32899638b59f857e26efeac18a82e0c0b77089
2011-06-16 13:01:27 -04:00
John Koleszar
eb645abeac Merge "Disable specialcase for last frames if the sequence contains ARFs." 2011-06-16 09:56:05 -07:00