When users create an invalid shader, the shader->valid flag is set to false. Batching previously assumes that shaders are valid, and this can result in primitives with invalid shader being joined, causing visual errors.
This PR prevents joining items that have invalid shaders.
Allows users to override default API usage, in order to get best performance on different platforms.
Also changes the default legacy flags to use STREAM rather than DYNAMIC.
When using modulate_fvf, final_modulate was still being applied on CPU in some circumstances, AS WELL as in the shader. This double application resulted in the wrong color.
This PR prevents CPU multiplication of final_modulate when modulate_fvf is in use.
It also applies an OR to the joined item flags with each item joined. This fixes a bug where a smaller FVF is used than required, resulting in incorrect colors.
In rare cases default batches could occur which were containing commands that were not owned by the first item referenced by the joined item. This had assumed to be the case, and would read the wrong command, or crash.
Instead for safety in this PR we now store a pointer to the parent item in default batches, and use this to determine the correct command list instead of assuming.
An earlier PR #46898 had flipped the rotation basis polarity. This turns out to also need a corresponding flip for the light angles for the lighting to make sense.
The editor under certain circumstances is passing invalid polys to the renderer. This should be fixed upstream but just in case this PR adds fault tolerance for invalid indices.
Trying to use the old `hardware_transform` flag to combine the new large_fvf has lead to several bugs. So here the logic is broken out into 2 separate components, single item and large_fvf.
The old `hardware_transform` name also no longer makes sense, as there are now 3 transform paths:
Software (CPU)
Hardware (uniform)
Hardware (attribute)
Large FVF which encodes the transform in a vertex attribute is triggered by reading from VERTEX in a custom shader. This means that the local vertex position must be available in the shader, so the only way to batch is to also pass the transform as an attribute.
The large FVF path already disabled CPU transform in the case of rects, but not in other primitives, which this PR fixes.
Note that large FVF is incompatible with 2d software skinning. So reading from VERTEX in a custom shader when using skinning will not work.
- Fix objects with no material being considered as fully transparent by the lightmapper.
- Added "environment_min_light" property: gives artistic control over the shadow color.
- Fixed "Custom Color" environment mode, it was ignored before.
- Added "interior" property to BakedLightmapData: controls whether dynamic capture objects receive environment light or not.
- Automatically update dynamic capture objects when the capture data changes (also works for "energy" which used to require object movement to trigger the update).
- Added "use_in_baked_light" property to GridMap: controls whether the GridMap will be included in BakedLightmap bakes.
- Set "flush zero" and "denormal zero" mode for SSE2 instructions in the Embree raycaster. According to Embree docs it should give a performance improvement.
As part of the improvements to batch more cases, batching can store final_modulate as an attribute in the vertex format rather than sending as a uniform. This allows draw calls with different final_modulate to be batched together.
However custom shader code was reading from only the final_modulate uniform, and not the attribute when it was in use. This was leading to visual errors.
This is tricky to solve, because we cannot use the same name for the attribute in the vertex and fragment shaders, because one is an attribute and one a varying, whereas a uniform is accessible anywhere. To get around this, a macro is used which can translate to the most appropriate variable depending on whether uniform or attribute or varying is required.
This is something that I missed from the initial implementation of large FVF. In large FVF the transform is sent per vertex in an attribute, and the vertex position is the original vertex position. This is so that the original vertex position can be read and modified in a custom shader.
This whole system is therefore incompatible with the legacy hardware transform method, whereby the transform is sent in a uniform. The shader already correctly ignores the uniform transform, but there are some parts of the CPU side logic that can be confused treating large FVF batches as if they were hardware transform.
This PR completes the logic by making the CPU treat large FVF as though it was software transform.
Slight technical hitch, the basis was reversed that was sent to the shader, so rotations were opposite. This PR reverses polarity of the basis to be correct.
There have been a couple of reports of pixel lines when using light scissoring. These seem to be an off by one error caused by either rounding or pixel snapping.
This PR adds a single pixel boost to light scissor rects to protect against this. This should make little difference to performance.
Although batching supported both ninepatch modes (fixed and scaling) when using ninepatch stretch mode, the ninepatch tiling modes (in GLES3) could only run through the shader.
The shader only supported one of the ninepatch modes. This PR uses the hack method of #if defined in the shader to prevent the use of a conditional. The define is set at startup according to the project setting.
GLES3 changes:
This commit makes it possible to disable 3D directional lights by using
the light's cull mask. It also automatically disables directionals when
the object has baked lighting and the light is set to "bake all".
GLES2 changes:
Added a check for the light cull mask, since it was previously ignored.
One of the new fvf types (FVF_MODULATED) allows batching custom shaders that use modulate. The only slight oversight is that there is a special define when MODULATE is used in a custom shader, called MODULATE_USED, that is checked, and if set it does NOT apply final_modulate as part of canvas.glsl.
This MODULATE_USED define wasn't checked when the new FVF was used and modulate was passed in an attribute.
This PR moves the application of the final_modulate into the #ifndef MODULATE_USED section.
The rendering/quality/2d section of project settings is becoming considerably expanded in 3.2.4, and arguably was not the correct place for settings that were not really to do with quality.
3.2.4 is the last sensible opportunity we will have to move these settings, as the only existing one likely to break compatibility in a small way is `pixel_snap`, and given that the whole snapping area is being overhauled we can draw attention to the fact it has changed in the release notes.
Class reference is also updated and slightly improved.
`pixel_snap` is renamed to `gpu_pixel_snap` in the project settings and code to help differentiate from CPU side transform snapping.
Antialiasing is not supported for batched polys. Currently due to the fallback mechanism, skinned antialiased polys will be rendered without applying animation.
This PR simply treats such polys as if antialiasing had not been selected. The class reference is updated to reflect this.
Due to multi pass approach to lighting in GLES2, in some situations the rendered result can look different if lights are presented in a different order.
The order (aside from directional lights) seems to be simply copied from the culling routine (octree or bvh) which is essentially arbitrary. While octree is usually consistent with order, bvh uses a trickle optimize which may result in lights occurring in different order from frame to frame.
This PR adds an extra layer of sorting on GLES2 lights in order to get some kind of order consistency.
These functions don't yet exist on ubuntu 14.04 so this leads to build
problems there. Omitting these symbols in the generated wrappers fixes
this. If we want to start using these symbols at a later date we should
just regenerate the wrapper.
(cherry picked from commit f42a7f9849)
This suppresses the blocky shadow appearance, bringing the shadow rendering
much closer to GLES3. This soft filter is more demanding as it requires
more lookups, but it makes PCF13 shadows more usable.
The soft PCF filter was adapted from three.js.
This #define's older inttypes to their newer versions and #includes
<stdint.h> in the generated files. This will help with older
glibc/compiler versions using headers generated on newer systems.
cherry picked from 5233d78f49
These are benign but worth fixing as it clears the log to find more important errors.
A common problem with the sanitizer is that enums are often used to represent bits (e.g. 1, 2, 4, 8 etc) but without specifying the enum type, the compiler is free to use unsigned or signed int. In this case it uses int, and when it performs bitwise operations on the int type, the sanitizer complains.
This is probably because a bitshift with negative signed value can give undefined behaviour - the sanitizer can't know ahead of time that you are using the enum for sensible bitflags.
- Based on C++11's `thread` and `thread_local`
- No more need to allocate-deallocate or check for null
- No pointer anymore, just a member variable
- Platform-specific implementations no longer needed (except for the few cases of non-portable functions)
- Simpler for `NO_THREADS`
- Thread ids are now the same across platforms (main is 1; others follow)
- Based on C++11's `mutex` and `condition_variable`
- No more need to allocate-deallocate or check for null
- No pointer anymore, just a member variable
- Platform-specific implementations no longer needed
- Simpler for `NO_THREADS`
- Based on C++11's `mutex`
- No more need to allocate-deallocate or check for null
- No pointer anymore, just a member variable
- Platform-specific implementations no longer needed
- Simpler for `NO_THREADS`
- `BinaryMutex` added for special cases as the non-recursive version
- `MutexLock` now takes a reference. At this point the cases of null `Mutex`es are rare. If you ever need that, just don't use `MutexLock`.
- `ScopedMutexLock` is dropped and replaced by `MutexLock`, because they were pretty much the same.
- Based on C++14's `shared_time_mutex`
- No more need to allocate-deallocate or check for null
- No pointer anymore, just a member variable
- Platform-specific implementations no longer needed
- Simpler for `NO_THREADS`
It appears that we can get a fun circle dependency on a shared object on
some system configurations causing issues with our 'fake' function
pointer names. This can lead to a crash.
The new wrapper generator renames all the symbols so this can't happen
anymore. See https://github.com/hpvb/dynload-wrapper/commit/704135e
Cherry-pick from 8d36b17343
By generating stubs using https://github.com/hpvb/dynload-wrapper we
can dynamically load libpulse and libasound on systems where it is available.
Both are still a build-time requirement but no longer a run-time dependency.
For maintenance purposes the wrappers should not need to be re-generated
unless we want to bump pulse or asound to an incompatible version. It is
unlikely we will want to do this any time soon.
cherry-pick from 09f82fa6ea
Valgrind reported two instances of reading uninitialized memory in the batching. They are both pretty benign (as evidenced by no bug reports) but wise to close these.
The first is that when changing batch from a default batch it reads the batch color which is not set (as it is not relevant for default batches). The segment of code is not necessary when it has already deemed a batch change necessary (which will occur from a default batch). In addition this means that the count of color changes will be more accurate, rather than having a possible random value in.
The second is that on initialization _set_texture_rect_mode is called before the state has been properly initialized (it is initialized at the beginning of each canvas_begin, but this occurs outside of that).
Reordering an item from after a copybackbuffer to before would result in the wrong thing being rendered into the backbuffer.
This PR puts in a check to prevent reordering over such a boundary.
Move definition of rendering/quality/filters/anisotropic_filter_level to
servers/visual_server.cpp, since both GLES2 and GLES3 now use it
rasterizer_storage_gles3.cpp: Remove a spurious variable write (the
value gets overwritten soon after)
- Fix Embree runtime when using MinGW (patch by @RandomShaper).
- Fix baking of lightmaps on GridMaps.
- Fix some GLSL errors.
- Fix overflow in the number of shader variants (GLES2).
Completely re-write the lightmap generation code:
- Follow the general lightmapper code structure from 4.0.
- Use proper path tracing to compute the global illumination.
- Use atlassing to merge all lightmaps into a single texture (done by @RandomShaper)
- Use OpenImageDenoiser to improve the generated lightmaps.
- Take into account alpha transparency in material textures.
- Allow baking environment lighting.
- Add bicubic lightmap filtering.
There is some minor compatibility breakage in some properties and methods
in BakedLightmap, but lightmaps generated in previous engine versions
should work fine out of the box.
The scene importer has been changed to generate `.unwrap_cache` files
next to the imported scene files. These files *SHOULD* be added to any
version control system as they guarantee there won't be differences when
re-importing the scene from other OSes or engine versions.
This work started as a Google Summer of Code project; Was later funded by IMVU for a good amount of progress;
Was then finished and polished by me on my free time.
Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>
nanosleep returns 0 or -1 not the error code.
The error code "EINTR" (if encountered) is placed in errno, in which case nanosleep can be safely recalled with the remaining time.
This is required, so that nanosleep continues if the calling thread is interrupted by a signal.
See manpage nanosleep(2) for additional details.
(cherry picked from commit 1107c7f327)
Happy new year to the wonderful Godot community!
2020 has been a tough year for most of us personally, but a good year for
Godot development nonetheless with a huge amount of work done towards Godot
4.0 and great improvements backported to the long-lived 3.2 branch.
We've had close to 400 contributors to engine code this year, authoring near
7,000 commit! (And that's only for the `master` branch and for the engine code,
there's a lot more when counting docs, demos and other first-party repos.)
Here's to a great year 2021 for all Godot users 🎆
(cherry picked from commit b5334d14f7)
Ninepatch code has a check to prevent use of zero sized textures. This didn't deal properly with animated textures, which use a proxy (link to another texture).
This PR uses a generalised method of getting textures, with built in support for proxy textures and protection against infinite loops.
These were only put in for the betas, in order to test hypotheses for stalling on Macs. It seems that most of the problems in the Mac editor have been solved by fixing the excessive redraw_requests.
As a result no one has reported any results from these options, but in future we will be able to refer users to try the beta versions, so there is no need to include them in the stable release. Indeed they are only likely to cause confusion.
The root cause of the issue is that OpenGL ES 2 does not support the `textureCubeLod` function.
There are (optional) extensions to support this, but they don't appear to be exposed with the ES2 renderer (even though the hardware needed to support LOD features are certainly available.)
The existing shim in `drivers/gles2/shaders/cubemap_filter.glsl` just creates a macro:
```
#define textureCubeLod(img, coord, lod) textureCube(img, coord)
```
But the third parameter of `textureCube` is actually a mip bias, not an absolute mip level.
(And it doesn't seem to work regardless.)
In this specific case, the `cubemap_filter` should only sample from the first level of the "source" panorama cubemap.
In lieu of a method to force a lod level of zero, I've chosen to comment out the switchover from a 2D equirectangular panorama to the cubemap version of the same image, therefore always sampling roughness values from the 2D equirectangular panorama.
This may cause additional artifacts or issues across the seam, but at least it prevents the glaringly obvious black areas.
---
This same issue (no fragment texture LOD support) has rather large repercussions elsewhere too; it means materials with larger cubemap density (i.e. planar or distant objects) will be far rougher than expected.
Since GLES 3 appears to properly support fragment `texture*Lod` functions, switching to the GLES 3 backend would solve this problem.
---
Root cause discovered with help from @KaadmY.
Image::resize_to_po2() now takes an optional p_interpolation parameter
that it passes directly to resize() with default value INTERPOLATE_BILINEAR.
GLES2: call resize_to_po2() with interpolate argument
Call resize_to_po2() in GLES2 rasterizer storage with either
INTERPOLATE_BILINEAR or INTERPOLATE_NEAREST depending on TEXTURE_FLAG_FILTER.
This avoids filtering issues with non power of two pixel art textures.
See #44379
Although the minimum size of ninepatches is set to the sum of the margins in normal use (through gdscript etc) it turns out that it is possible to programmatically create ninepatches that are small than this minimum - in particular zero size is used in sliders to not draw items.
This PR deals with zero sized ninepatches by not drawing anything, and has some basic protection for ninepatches smaller than the margins. Whether these occur in the wild is not clear but is put in for completeness.
When using the ALSA driver, corruption would occur if `snd_pcm_writei`
was unable to consume the entire sound buffer. This would occur
frequently on the Raspberry Pi 3 which uses the `snd_bcm2835` audio
driver.
This bug resulted from incorrect pointer math on line 187, resulting in
the sample source pointer being advanced by `total * ad->channels` bytes
instead of `total * ad->channels` samples. In my opinion, the best fix
is to change `*src` to type `int16_t`, since that is the sample type in
use.
Fixes#43927.
(cherry picked from commit 25b2f82ccf)
See #43689.
Also 'fixed' some spelling for behavior in publicly visible strings.
(Sorry en_GB, en_CA, en_AU, and more... Silicon Valley won the tech spelling
war.)
(cherry picked from commit a655de89e3)
Lights with bake mode set to "All" were behaving erratically because of a
faulty check in the renderer. This should be the correct way to check if
a geometry instance is using baked light.
For fixing a previous issue state.canvas_texscreen_used was reset to false at the start of each render_joined_item. This was causing a later shader that used SCREEN_TEXTURE to force recapturing the back buffer immediately prior to use, which we don't want.
This PR preserves the state across joined items, and also prevents joining of items that copy the back buffer as this may be problematic.
It turns out that the original issue that needed the line is now fixed, and the later issue is also fixed by removing it.
While adding more debug checks to legacy renderer, I closed 2 types of vulnerabilities:
* TYPE_PRIMITIVE would previously read from uninitialized data if only specifying a single color
* Other legacy draw operations would fail in debug AFTER accessing out of bounds memory rather than before
Many calls to glBufferSubData are wrapped in a safe version which checks for out of bounds and exits the draw function if this is detected.
Large FVF allows batching of many custom shaders, but should not join items which have shaders that utilize BUILTINs which would change for each item, because these will not be sent individually, and all joined items would wrongly use the values from the first joined item.
Polys that have no texture assigned contain no UVs in the poly command. These were previously not blanked, leading to random values if read from a custom shader.
This PR just blanks them.
In some situations where polygons were scaled, existing software skinning was producing incorrect results.
The transform inverse needed to use an affine inverse rather than a cheaper inverse to account for this scaling.
This adds support for custom shaders for polys, and properly handles modulate in the case of large FVF and modulate FVF.
It also fixes poly vertex colors not being sent to OpenGL.
Antialiased polys work by drawing a smoothed line around the poly after the main drawing. Batching draws polys as a series of triangles with no concept of 'edge', and when 2 polys are joined it becomes impractical to back calculate the edges from the triangles.
For this reason batching is disabled for antialiased polys in this PR.
As a result of the GLES specifications being vague about best practice for how buffers should be used dynamically, different GPUs / platforms appear to have different preferences.
Mac in particular seems to have a number of problems in this area, and none of the rendering team uses Macs. So far we have relied on guesswork to choose the best usage, but in an attempt to pin this down, this PR begins to introduce manual selection of options for users to test their configurations.
Lines are batched using the simplest fvf 'BatchVertex', however when used in an item with a custom shader material, it may attempt to translate to large_fvf without the required extra channels. To prevent this a special case in flushing is made to deal with lines.
In small batches using hardware transform, vertices would be drawn in incorrect positions due to the item transform being applied twice - once in the transform uniform, and once from the transform passed as a vertex attribute.
This PR alters the shader to ignore uniform transforms when using large FVF.
Due to my less than eagle-like view over these functions I had assumed they were passing in a single buffer input for the changes to make buffer uploading more efficient. They aren't, which is less than ideal.
So these particular changes should be reverted. When I have some more time I'll see whether the API for these calls can be changed, because as is the multiple glSubBufferData calls could be causing stalls on some hardware.
It can be enabled in the Project Settings
(`rendering/quality/filters/use_debanding`). It's disabled
by default as it has a small performance impact and can make
PNG screenshots much larger (due to how dithering works).
As a result, it should be enabled only when banding is noticeable enough.
Since debanding requires a HDR viewport to work, it's only supported
in the GLES3 backend.
This is part of effort to make more efficient use of the API for devices with poor drivers. This eliminates multiple calls to glBufferSubData per update.
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.
Batching is extended to cover more primitives.
Don't apply lighting to objects when they have a lightmap texture and
the light is set to BAKE_ALL. This prevents applying the same direct
light twice on the same object and makes setting up scenes with mixed
lighting much easier.
Option in MeshInstance to enable software skinning, in order to test
against the current USE_SKELETON_SOFTWARE path which causes problems
with bad performance.
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
Fixes: #28683, #28621, #28596 and maybe others
For iOS we enable pvrtc feature by default for both GLES2/GLES3
Etc1 for iOS doesn't have any sense, so it disabled.
Fixed checks in export editor.
Fixed pvrtc ability detection in GLES2 driver.
Fixed pvrtc encoding procedure.
Add __NetBSD__ to `platform_config.h` so that it can find `alloca`
and use the proper `pthread_setname_np` format.
Rename RANDOM_MAX to avoid conflict with NetBSD stdlib.
Fixes#42145.
(cherry picked from commit 5f4d64f4f3)
- Fade reflection towards inner margin and clip it at screen edges instead of external margin.
- Round edges of the fade margin if both are being cut off to prevent sharp corners.
Previously VS::INSTANCE_NONE was returned for Lightmap data, this caused `visual_server_scene.cpp` to assert in `instance_set_use_lightmap()`
Now `dummy_rasterizer.h` returns `VS::INSTANCE_LIGHTMAP_CAPTURE` for lightmap capture data thus satisfying `visual_server_scene`
When not using TEXTURE_RECT path, flips have to sent via another method to the shader, to ensure that normal maps are correctly adjusted for direction. This PR adds an extra vertex attribute, LIGHT_ANGLE.
For nvidia workarounds, where the shader still has access to the final transform and extra matrix, the LIGHT_ANGLE can be 0 (no adjustment), 180 degrees for a horizontal flip, and negative indicates a vertical flip.
For batching path, the LIGHT_ANGLE can be used to directly specify the light angle for normal mapping, even when the final transform and extra matrix have been baked into vertex positions, so the same shader can be used for both.
Normal mapping previously took no account of rotation or flips in any path except the TEXTURE_RECT (uniform draw) method. This passed flips to the shader in uniforms.
In order to pass flips and rotations to the shader in batching and nvidia workaround, a per vertex attribute is required rather than a uniform. This introduces LIGHT_ANGLE which encodes both the rotation of a quad (vertex) and the horizontal and vertical flip.
In order to optionally store light angles in batching, we switch to using a 'unit' sized array which can be reused for different FVF types, as there is no need for a separate array for each FVF, as it is a waste of memory.
In rare circumstances an item would issue multiple transform commands before a (non rect) draw command. The command syncronization would incorrectly start from first transform, instead of the current transform in these circumstances, which could have the result of missing drawing some commands from the end of the batch.
This had been shown in the wild occuring in debug collision polys. It was a benign error (sometimes visual elements would be lost), but did not cause any serious problems.
This PR fixes this synchronization error.
Compiler is usually in the best position to decide whether to inline functions. Great care must be taken using FORCE_INLINE because it can have unforeseen consequences with recursion, loops and bloat to the executable.
Here some FORCE_INLINES are removed in order to allow the compiler to make best choice and remove a compilation warning where unable to inline during a recursive function.
Fixes#41226
On platforms that don't report support for GL_REPEAT for non power of two textures, the FORCE_REPEAT conditional is used instead. However for rect batches, the conditional was being set AFTER binding the shader, which meant it wasn't being activated.
This PR simply shifts setting the conditional to before the shader bind.
1. Removed errors in mesh_surface_get_array as it's supported now
2. More accurate errors in mesh_surface_get_blend_shapes
(cherry picked from commit e19a3df98f)
For textures that were imported as wrapping, the legacy renderer relied on GL repeat state being set as a once off during load, and didn't alter the GL wrapping state at runtime.
Batching was setting wrapping according to the CANVAS_RECT_TILE flag on rects, however this reset GL wrapping to clamp after use, which was conflicting with later drawcalls that relied on the default wrapping being preserved.
In this PR we only set the wrapping in GL if the texture has not been imported with wrapping. This duplicates the logic in the legacy renderer and solves the state bug.
Using the operator += in a shader is classified as an 'assign', and so is classified as a write rather than a read. This means that we need to prevent vertex baking on either a write or read (i.e. on usage), rather than just on reads.
The old logic was incorrect, the first item with lights would prevent joining the next item in case it didn't have lights. Now the check is deferred so that items without lights check to see if the previous item had lights, and if so they prevent a join.
Configured for a max line length of 120 characters.
psf/black is very opinionated and purposely doesn't leave much room for
configuration. The output is mostly OK so that should be fine for us,
but some things worth noting:
- Manually wrapped strings will be reflowed, so by using a line length
of 120 for the sake of preserving readability for our long command
calls, it also means that some manually wrapped strings are back on
the same line and should be manually merged again.
- Code generators using string concatenation extensively look awful,
since black puts each operand on a single line. We need to refactor
these generators to use more pythonic string formatting, for which
many options are available (`%`, `format` or f-strings).
- CI checks and a pre-commit hook will be added to ensure that future
buildsystem changes are well-formatted.
(cherry picked from commit cd4e46ee65)
Scaling tilemaps can cause border artifacts around the edges of tiles. This has been traced to precision issues in the GPU. This PR adds an adjustment to allow a minor contraction of the UVs of rects in order to compensate for the incorrect classification of texels across the UV border.
As it now seems like we will soon have GLES3 batching working using the same intermediate layer as GLES2, it makes more sense to reuse the same batching settings for both renderers rather than duplicate project settings for GLES2 and GLES3.
Builtins that should prevent baking colors and vertex positions were incorrectly only active in shaders that were not unshaded. This was a terminology misunderstanding - unshaded materials can still use shaders so should have the same test to prevent baking.
Each driver used to define the (same) project settings value, but the
setting names are not driver specific. Ovverriding is still possible via
platform tags.
(cherry picked from commit 90c7102b51)
The behaviour of TYPE_POLYLINE appeared incorrect in GLES2, and inconsistent with GLES3 and the docs, which state that draw_polyline 'Draws interconnected line segments'. Also when drawing with triangles GLES2 draws interconnected segments.
This PR simply changes the primitive from GL_LINES to GL_LINE_STRIP as in GLES3.
Writing to COLOR in a custom shader can result in incorrect results if colors are baked (vertex color and modulate). This PR prevents baking with COLOR output, except under the special circumstances that final modulate is (1, 1, 1, 1), in which case the result will be correct. This should still allow color baking in many scenarios with custom shaders.
In addition to prevent item joins when VERTEX reads are present in a custom shader, it is also necessary to prevent baking extra matrices (extra transforms) WITHIN items, because these can also report incorrect results.
In situations where custom canvas shaders read VERTEX values, they could read incorrect global positions rather than local positions where batching had baked item transforms. This PR prevents joining items that read VERTEX built in in shaders, and thus prevents this situation arising (as unjoined items will not bake transforms).
Affects per-pixel transparency
The current method renders to the screen by copying the GLES output to a
DIB for transparency using the CPU instead of rendering directly to the
window via the GPU. This is slower and also forces the window to be borderless
as WS_EX_LAYERED affects the non-client region as well.
This change uses DWMEnableBlurBehindWindow which allows using the standard
glClearColor() background alpha and is also performed through the GPU,
eliminating CPU bottlenecks
There was a bug in the initial logic for item reordering, whereby it would check for overlaps between the mover (item being moved back) and sandwiched items, but there was no check for overlaps between the movee (item moved forward) and the sandwich items. This extra check is now done.
Also a minor addition to the diagnose frame info (godot texture ID).
It seems that particles (and some other features) do not work correctly on iOS in GLES2 because either many of the devices do not support half float compression, or the GL constant used to reference it from Godot is incorrect.
This PR adds a project setting in rendering/gles2/ to disable half-float compression on iOS.
Adding the ability to access MODULATE in the shader breaks when final_modulate is baked into vertex colors (this is a technique used to batch together different colored items). This PR prevents baking vertex colors when MODULATE is detected in the shader.
It also prevents baking when COLOR is read in canvas shaders, which could currently produce the wrong result in the shader if colors were baked. It does not prevent baking if COLOR is only written, which happens in most shaders, and will operate correctly without baking.
Although 2D draws in painters order with strict ordering, in certain circumstances items can be reordered to increase batching / decrease state changes, without affecting the end result. This can be determined by an overlap test.
In situation with item:
A-B-A
providing the third item does not overlap the second, they can be reordered:
A-A-B
Items already contain an AABB which can be used for this overlap test.
1)
To utilise this, I have implemented item reordering (only for single rects for now), with the lookahead adjustable in project settings. This can increase performance in situations where items may not be grouped in the scene tree by texture. It can also be switched off (by setting lookahead to 0).
2)
This same trick can be used to help join items that are lit. Lit items previously would prevent joining completely, thus missing out on performance gains other than multi-command items such as tilemaps.
In this PR, lights are assigned as bits in a bitfield (up to 64, the optimization is disabled above this), and on each try_item (for joining), the bitfield for lights and shadows is constructed and compared with the previous items. If these match the 2 items can potentially be joined. However, this can only be done without changing the rendered result if an overlap test is successful.
This overlap test can be adjusted to join items up to a specific number of item references, selectable in project settings, or turned off.
3)
The legacy uniform single rect drawing routine seems to have been identified as the source of flicker, particularly on nvidia. However, it can also be up to 2x as fast. Because of the speed the batching contains a fallback where it can use the legacy single rect method, but I have now added a project setting to make this switchable. In most cases with batching it should not be necessary (as single rects are drawn less frequently) and thus the flickering can be totally avoided.
4)
This PR also fixes a color modulate bug when drawing light passes, in certain situations (particularly custom _draw routines with multiple rects).
5)
This PR also fixes#38291, a bug in the legacy renderer where light passes could draw rects in wrong position.
- Resurrect it for GL ES 2
- Apply roll over with `fmod()` instead of resetting it to 0
- Expose the setting from the `VisualServer`, since it does not belong in any specific rasterizer
This should make headless exporting work in projects using textures
in any format.
Error messages should no longer appear when running a project
that used image formats that were previously not present in the list.
(cherry picked from commit 3007c7e2a3)
When reading SCREEN_TEXTURE in a shader, this previously only worked succesfully for the first read of the screen, because state.canvas_texscreen_used was never getting reset. This PR resets state.canvas_texscreen_used at the beginning of each joined item, so that further screen reads can happen.
Joining items across z_indices can interfere with light culling for lights which only affect certain z ranges. This PR disables joining across z_indices when lights are present, except specifically for lights with both z_min set to the global minimum (-4096) and z_max set to the global maximum (4096).
In addition, the z_index is now stored on the joined_item for accurate light culling. The z_index is also displayed in frame diagnostics.
In rare circumstances default batches were being joined incorrectly, causing visual regressions. This logic has been fixed.
In addition slightly more output information has been added to frame diagnosis mode.
Batching across z_index layers was not preserving the batch_break flag, which determines whether to not join the previous item. This is fixed by storing the flag in RenderItemState and preserving it across canvas_render_items calls.
Added project setting to enable / disable print frame diagnostics every 10 seconds. This prints out a list of batches and info, which is useful to optimize games and identify performance problems.
This adds 2 new values (items and draw calls) to the performance monitor in a '2d' section, rather than reusing the 3d values in the 'raster' section.
This makes it far easier to optimize games to minimize drawcalls.
Defers sending 'transform' commands within a RasterizerCanvas::Item until they are needed for default batches. Instead locally caches the extra matrix and applies it using software transform, preventing unnecessary batch breaks.
The logic is relatively complex, and the whole 'extra matrix' of the legacy renderer in addition to the final_transform is not ideal. However this is required to accelerate some user drawing techniques, and later the lines in the IDE.
Extra functions canvas_render_items_begin and canvas_render_items_end are added to RasterizerCanvas, with noop stubs for non-GLES2 renderers. This enables batching to be spready over multiple z_indices, and multiple calls to canvas_render_items.
It does this by only performing item joining within canvas_render_items, and deferring rendering until canvas_render_items_end().
Determined that a large reason for the decrease in performance in unbatchable scenes was due to the new routine being analogous to the 'nvidia workaround' code, that is about half the speed. So this simply uses the old routine in the case of single unbatchable rects. Hopefully we will be able to remove the old path at a later stage.
Where the final_modulate color varies between render_items this can prevent batching. This PR solves this by baking final_modulate into the vertex colors, and setting the uniform 'final_modulate' to white, and allowing the joining of items that have different final_modulate values. The previous batching system can then cope with vertex color changes as normal.
2d rendering is currently bottlenecked by drawing primitives one at a time, limiting OpenGL efficiency. This PR batches primitives and renders in fewer drawcalls, resulting in significant performance improvements. This also speeds up text rendering.
This PR batches across canvas items as well as within items.
The code dynamically chooses between a vertex format with and without color, depending on the input data for a frame, in order to optimize throughput and maximize batch size. It also adds an option to use glScissor to reduce fillrate in light passes.
Namely, move the drive dropdown to just the left of the path text box and don't include the former
in the latter.
This improves the UX on Windows.
In the UNIX case, since its concept of drives is (ab)used to provide shortcuts to useful paths, its
dropdown is kept at the original location.