As a result of the GLES specifications being vague about best practice for how buffers should be used dynamically, different GPUs / platforms appear to have different preferences.
Mac in particular seems to have a number of problems in this area, and none of the rendering team uses Macs. So far we have relied on guesswork to choose the best usage, but in an attempt to pin this down, this PR begins to introduce manual selection of options for users to test their configurations.
It can be enabled in the Project Settings
(`rendering/quality/filters/use_debanding`). It's disabled
by default as it has a small performance impact and can make
PNG screenshots much larger (due to how dithering works).
As a result, it should be enabled only when banding is noticeable enough.
Since debanding requires a HDR viewport to work, it's only supported
in the GLES3 backend.
Another bug in the octree has been discovered which can cause flickering in rare circumstances : #42895
For safety until this is fixed properly this PR reverts the default state of the octree to match the old behaviour, which doesn't appear exhibit the bug (or at least not as readily).
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.
Batching is extended to cover more primitives.
Don't apply lighting to objects when they have a lightmap texture and
the light is set to BAKE_ALL. This prevents applying the same direct
light twice on the same object and makes setting up scenes with mixed
lighting much easier.
Option in MeshInstance to enable software skinning, in order to test
against the current USE_SKELETON_SOFTWARE path which causes problems
with bad performance.
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
Measure the distance from the line against the rotated object, not the
rotated line, when obtaining the object's supports against a line.
(cherry picked from commit 7e44682c03)
Keeps track of the order in which items are collected by
_collect_ysort_children, and uses that order to break
ties between items with similar Y positions.
(cherry picked from commit 8d3afa985b)
The thread model option for physics (2D) and rendering (single-unsafe,
single-safe, multithread), was causing crashes/locks when set as
multithreaded and exported for a platform that does not support threads
(namely HTML5).
This commit ensures that when threads support is not available, that
option is ignored, and the equivalent of "single-unsafe" is always used
instead.
(cherry picked from commit f3c6ac1d71)
Prevents adding new octants until a limiting number of elements have been added to the current octant. This enables balancing the benefits of brute force against the benefits of spatial partitioning. The limit can be set per octree.
Project settings are added for rendering octree to set the best balance per project depending on number of tests per frame / tick, and the amount of editing of the octree.
Fixes octants being leaked when removing elements.
Optimize octree with cached linear lists
Storing elements in octants using linked lists is efficient for housekeeping but very slow for testing. This optimization stores additional local_vectors with Element pointers and AABBs which are cached and only updated when a dirty flag is set on the octant.
This is selectable with 2 versions of Octree : Octree and Octree_CL, Octree being the old behaviour. At present the cached list version is only used for the visual server octree (rendering) as it has only been demonstrated to be faster there so far.
This uses slightly more memory (probably a few kb in most cases) but can be significantly faster during testing (culling etc).
Co-authored-by: Sergey Minakov <naithar@icloud.com>
Configured for a max line length of 120 characters.
psf/black is very opinionated and purposely doesn't leave much room for
configuration. The output is mostly OK so that should be fine for us,
but some things worth noting:
- Manually wrapped strings will be reflowed, so by using a line length
of 120 for the sake of preserving readability for our long command
calls, it also means that some manually wrapped strings are back on
the same line and should be manually merged again.
- Code generators using string concatenation extensively look awful,
since black puts each operand on a single line. We need to refactor
these generators to use more pythonic string formatting, for which
many options are available (`%`, `format` or f-strings).
- CI checks and a pre-commit hook will be added to ensure that future
buildsystem changes are well-formatted.
(cherry picked from commit cd4e46ee65)
As the masked light list takes no account of layer_min and layer_max, the canvas_layed_id is passed to the _light_mask_canvas_items function where it can be used to reject lights outside the layer range.
Scaling tilemaps can cause border artifacts around the edges of tiles. This has been traced to precision issues in the GPU. This PR adds an adjustment to allow a minor contraction of the UVs of rects in order to compensate for the incorrect classification of texels across the UV border.
When using the default setting (layer 1 set only) nothing is stored in the tscn file for a Light2D, hence it relies on the value in the constructor.
The problem is the constructed value is 1 in Light2D, and -1 in RasterizerCanvas::Light. -1 results in all bits being set so all occluders are shown, rather than just those in layer 1.
This PR changes Rasterizer::Canvas constructor to set to 1. An alternative is to have -1 as the value for layer 1 throughout.
As it now seems like we will soon have GLES3 batching working using the same intermediate layer as GLES2, it makes more sense to reuse the same batching settings for both renderers rather than duplicate project settings for GLES2 and GLES3.
Each driver used to define the (same) project settings value, but the
setting names are not driver specific. Ovverriding is still possible via
platform tags.
(cherry picked from commit 90c7102b51)
It seems that particles (and some other features) do not work correctly on iOS in GLES2 because either many of the devices do not support half float compression, or the GL constant used to reference it from Godot is incorrect.
This PR adds a project setting in rendering/gles2/ to disable half-float compression on iOS.
Although 2D draws in painters order with strict ordering, in certain circumstances items can be reordered to increase batching / decrease state changes, without affecting the end result. This can be determined by an overlap test.
In situation with item:
A-B-A
providing the third item does not overlap the second, they can be reordered:
A-A-B
Items already contain an AABB which can be used for this overlap test.
1)
To utilise this, I have implemented item reordering (only for single rects for now), with the lookahead adjustable in project settings. This can increase performance in situations where items may not be grouped in the scene tree by texture. It can also be switched off (by setting lookahead to 0).
2)
This same trick can be used to help join items that are lit. Lit items previously would prevent joining completely, thus missing out on performance gains other than multi-command items such as tilemaps.
In this PR, lights are assigned as bits in a bitfield (up to 64, the optimization is disabled above this), and on each try_item (for joining), the bitfield for lights and shadows is constructed and compared with the previous items. If these match the 2 items can potentially be joined. However, this can only be done without changing the rendered result if an overlap test is successful.
This overlap test can be adjusted to join items up to a specific number of item references, selectable in project settings, or turned off.
3)
The legacy uniform single rect drawing routine seems to have been identified as the source of flicker, particularly on nvidia. However, it can also be up to 2x as fast. Because of the speed the batching contains a fallback where it can use the legacy single rect method, but I have now added a project setting to make this switchable. In most cases with batching it should not be necessary (as single rects are drawn less frequently) and thus the flickering can be totally avoided.
4)
This PR also fixes a color modulate bug when drawing light passes, in certain situations (particularly custom _draw routines with multiple rects).
5)
This PR also fixes#38291, a bug in the legacy renderer where light passes could draw rects in wrong position.
- Resurrect it for GL ES 2
- Apply roll over with `fmod()` instead of resetting it to 0
- Expose the setting from the `VisualServer`, since it does not belong in any specific rasterizer
Dual paraboloid shadowmaps were ending up with infinitely large volumes of area behind the hemisphere un-culled.
This change just adds a back plane to the convex shape used for the culling volume.
Added project setting to enable / disable print frame diagnostics every 10 seconds. This prints out a list of batches and info, which is useful to optimize games and identify performance problems.
This adds 2 new values (items and draw calls) to the performance monitor in a '2d' section, rather than reusing the 3d values in the 'raster' section.
This makes it far easier to optimize games to minimize drawcalls.
Extra functions canvas_render_items_begin and canvas_render_items_end are added to RasterizerCanvas, with noop stubs for non-GLES2 renderers. This enables batching to be spready over multiple z_indices, and multiple calls to canvas_render_items.
It does this by only performing item joining within canvas_render_items, and deferring rendering until canvas_render_items_end().
2d rendering is currently bottlenecked by drawing primitives one at a time, limiting OpenGL efficiency. This PR batches primitives and renders in fewer drawcalls, resulting in significant performance improvements. This also speeds up text rendering.
This PR batches across canvas items as well as within items.
The code dynamically chooses between a vertex format with and without color, depending on the input data for a frame, in order to optimize throughput and maximize batch size. It also adds an option to use glScissor to reduce fillrate in light passes.
af9bb0ea15 fixed AudioServer's
`get_output_delay()` (which used to always return 0) while renaming it
to `get_output_latency()`. It now returns the latency from the
AudioDriver, which can be non-0.
While this was a clear bugfix, it broke playback for WebM files without
audio track. It seems like the playback code, even though it queried
the output delay to calculate a time compensation, was designed to work
even though the delay value was actually bogus. Now that it's correct,
it's not working.
As a workaround we comment out uses of the output latency, restoring
the behavior of Godot 3.1.
This code should still be reviewed by someone more versed in video
playback and fixed to properly account for the non-0 driver latency.
Fixes#35760.
(cherry picked from commit da411d1625)
Fixes#26637.
Fixes#19900.
The viewport_size returned by get_viewport_size was previously incorrect, being half the correct value. The function is renamed to get_viewport_half_extents, and now returns a Vector2.
Code which called this function has also been modified accordingly.
This PR also fixes shadow culling when using ortho cameras, because the correct input for CameraMatrix::set_orthogonal should be the full HEIGHT from get_viewport_half_extents, and not half the width.
It also fixes state.ubo_data.viewport_size in rasterizer_scene_gles3.cpp to be the width and the height of the viewport in pixels as stated in the documentation, rather than the current value which is half the viewport extents in worldspace, presumed to be a bug.
Reverts the following commits:
- c81ec6f26d40b70283958a4ef3e216fb32cbaf14:
"Exposes capture methods to AudioServer, variable renames for
consistency, added documentation."
- 47c558b98abf842910c780294314326662410cdf:
"Expose audio callbacks as signals."
- dabaa11b3c451e9b8f2cca7e563bd9ec51edb169:
"Fix to make sure the capture buffers are deallocated at shutdown.
Silences warnings."
Some documentation improvements were kept for pre-existing methods.
See rationale for reverting these changes in #30468.
Some cases were not handled properly for Polygon2D after making changes in common code to fix Line2D antialiasing. Added an option for drawing polygons to differentiate the two use cases.
Fixes#34568
Happy new year to the wonderful Godot community!
We're starting a new decade with a well-established, non-profit, free
and open source game engine, and tons of further improvements in the
pipeline from hundreds of contributors.
Godot will keep getting better, and we're looking forward to all the
games that the community will keep developing and releasing with it.
Prior to this fix, AudioEffectRecordInstance::init() was called before recording_active is set to true in AudioEffectRecord::set_recording_active(). This was setting is_recording to false in AudioEffectRecordInstance, because is_recording updates to the value of recording_active in AudioEffectRecordInstance::_io_thread_process(). To fix this issue, AudioEffectRecordInstance::init() is now called after recording_active is set to true.
Polygon2D:
The property wasn't used anymore after switching from canvas_item_add_polygon() to canvas_item_add_triangle_array() for drawing.
Line2D:
Added the same property as for Polygon2D & fixed smooth line drawing to use indices correctly.
Fixes#26823
Make sure particles are processed during the same frame when visibility is set to on, in case they are still active from before and need to be restarted.
Fixed#33476
The list of allowed shader types is now displayed if any
`shader_type`-related error is emitted.
This makes it easier to remember which shader types are allowed
when creating a new shader.
Implemented uniform API in Viewport class to override 2D and/or
3D camera.
Added buttons in 2D and 3D editor viewport toolbars that override
the running game camera transform with the editor viewport camera
transform. Implemented via remote debugger protocol and camera
override API.
Removed LiveEditFuncs function pointers from ScriptDebugger class.
Since the debugger got access to the SceneTree instance (if one
exists), there is no need to store the function pointers. The live
edit functions in SceneTree are used directly instead. Also removed
the static version of live edit functions in SceneTree for the same
reason. This reduced the SceneTree -> Debugger coupling too since
the function pointers don't need to be set from SceneTree anymore.
Moved script_debugger_remote.h/cpp from 'core/' to 'scene/debugger/'.
This is because the remote debugger is now using SceneTree directly
and 'core/' classes should not depend on 'scene/' classes.
Showing and hiding geometry nodes was not showing or hiding their respective shadows. This fixes this by marking any affected lights as dirty.
Fixes#29856
In 2.1 and 3.0, light_vec could be modified for altering shadow_computations.
But it broke shadows when rotating light. shadow_vec would do the same, but without breaking
shadows in rotated lights if not used.
Add inverse light transformation to shadow vec, so it's not affected when rotating lights;
Added usage define for shadow vec.
For shadow vec working properly when rotating a light, it's needed to multiply it by light_matrix normalized. Added usage define in order to don't do that if shadow_vec not used.
For clarity, assign-to-release idiom for PoolVector::Read/Write
replaced with a function call.
Existing uses replaced (or removed if already handled by scope)
When closing the game, we flush the command queue but after we are pushing the freeing calls of the id pool to the
command queue and they are never being run. Now we free them directly.
This is a new singleton where camera sources such as webcams or cameras on a mobile phone can register themselves with the Server.
Other parts of Godot can interact with this to obtain images from the camera as textures.
This work includes additions to the Visual Server to use this functionality to present the camera image in the background. This is specifically targetted at AR applications.
It's not necessary, but the vast majority of calls of error macros
do have an ending semicolon, so it's best to be consistent.
Most WARN_DEPRECATED calls did *not* have a semicolon, but there's
no reason for them to be treated differently.
GLSL allows the construction of larger data types by swizzling smaller
ones, but Godot shading language treated this as an error:
vec2 test2 = vec2(0.0, 1.0);
vec3 test3 = test2.xxx; // error: Invalid member for vec2 expression
This commit updates the expression parser for the 2 and 3-component data
types accordingly.
Fixes#10496
Reasoning: ID is not an acronym, it is simply short for identification, so it logically should not be capitalized. But even if it was an acronym, other acronyms in Godot are not capitalized, like p_rid, p_ip, and p_json.
Adds the ability to directly add disabled shapes to a collision object. Before this commit a shape has always been assumed to be enabled and had to be disabled in an extra step.
The bake mode property of lights previously didn't affect GI probes.
This change makes the GI probe ignore lights that have their bake mode
set to disabled.
Made AudioFrame and Vector2 equivalent for casting.
Added ability to obtain the playback object from stream players.
Added ability to obtain effect instance from audio server.
It seems to stay compatible with formatting done by clang-format 6.0 and 7.0,
so contributors can keep using those versions for now (they will not undo those
changes).
With this change finally one can use compound collisions (like those created
by Gridmaps) without serious performance issues. The previous KinematicBody
code for Bullet was practically doing a whole bunch of unnecessary
calculations. Gridmaps with fairly large octant sizes (in my case 32) can get
up to 10000x speedup with this change (literally!). I expect the FPS demo to
get a fair speedup as well.
List of fixes and improvements:
- Fixed a general bug in move_and_slide that affects both GodotPhysics and
Bullet, where ray shapes would be ignored unless the stop_on_slope parameter
is disabled. Not sure where that came from, but looking at the 2D physics
code it was obvious there's a difference.
- Enabled the dynamic AABB tree that Bullet uses to allow broadphase collision
tests against individual shapes of compound shapes. This is crucial to get
good performance with Gridmaps and in general improves the performance
whenever a KinematicBody collides with compound collision shapes.
- Added code to the broadphase collision detection code used by the Bullet
module for KinematicBodies to also do broadphase on the sub-shapes of
compound collision shapes. This is possible thanks to the dynamic AABB
tree that was previously disabled and it's the change that provides the
biggest performance boost.
- Now broadphase test is only done once per KinematicBody in Bullet instead of
once per each of its shapes which was completely unnecessary.
- Fixed the way how the ray separation results are populated in Bullet which
was completely broken previously, overwriting previous results and similar
non-sense.
- Fixed ray shapes for good now. Previously the margin set in the editor was
not respected at all, and the KinematicBody code for ray separation was
complete bogus, thus all previous attempts to fix it were mislead.
- Fixed an obvious bug also in GodotPhysics where an out-of-bounds index was
used in the ray result array.
There are a whole set of other problems with the KinematicBody code of Bullet
which cost performance and may cause unexpected behavior, but those are not
addressed in this change (need to keep it "simple").
Not sure whether this fixes any outstanding Github issues but I wouldn't be
surprised.
There was a bug that could result in most bone aabb boxes ending up with
tiny size upon import and mess up with culling of skeletal meshes. This
fixes it.
- Texture arrays and 3D textures weren't working previously due to an
incorrect number of calls to glTexImage3D with incorrect level parameters.
This change fixes that.
- Fixed the incorrect calculation of the byte size of layered textures.
- Added the layer count to the debugger info when viewing video memory usage.
CLAMP limits the value between the two bounds, so for unsigned ints
it should be replaced by MIN(val, max), not MAX.
The issue in voxel_light_baker.cpp was fixed in 4f697f7.
Fixes#26170.
This allows most demos to run without any ubsan or asan errors. There
are still some things in thirdpart/ and some things in AudioServer that
needs a look but this fixes a lot of issues. This should help debug less
obvious issues, hopefully.
This fixes#25217 and fixes#25218
This reverts commit 5de5a4140b.
@reduz still intends to rework it in the future, and it's convenient to
test if issues are specific to Bullet or not, so we keep it around for
the time being.