Implemented splitting of vertex positions and attributes in the vertex
buffer
Positions are sequential at the start of the buffer, followed by the
additional attributes which are interleaved
Made a project setting which enables/disabled the buffer formatting
throughout the project
Implemented in both GLES2 and GLES3
This improves performance particularly on tile-based GPUs as well as
cache performance for something like shadow mapping which only needs
position data
Updated Docs and Project Setting
This is an older, easier to implement variant of CAS as a pure
fragment shader. It doesn't support upscaling, but we won't make
use of it (at least for now).
The sharpening intensity can be adjusted on a per-Viewport basis.
For the root viewport, it can be adjusted in the Project Settings.
Since `textureLodOffset()` isn't available in GLES2, there is no
way to support contrast-adaptive sharpening in GLES2.
All my earlier test cases for software skinning had the polys parent transform to be identity. This works fine until you had cases where the user had moved the transform of the parent nodes of skinned polys.
This PR fixes this situation by taking into account the final (concatenated) transform of the polys RELATIVE to the skeleton base transform. It does this by applying the inverse skeleton base transform to the poly final transform.
We've been using standard C library functions `memcpy`/`memset` for these since
2016 with 67f65f6639.
There was still the possibility for third-party platform ports to override the
definitions with a custom header, but this doesn't seem useful anymore.
Backport of #48239.
When users create an invalid shader, the shader->valid flag is set to false. Batching previously assumes that shaders are valid, and this can result in primitives with invalid shader being joined, causing visual errors.
This PR prevents joining items that have invalid shaders.
Allows users to override default API usage, in order to get best performance on different platforms.
Also changes the default legacy flags to use STREAM rather than DYNAMIC.
In rare cases default batches could occur which were containing commands that were not owned by the first item referenced by the joined item. This had assumed to be the case, and would read the wrong command, or crash.
Instead for safety in this PR we now store a pointer to the parent item in default batches, and use this to determine the correct command list instead of assuming.
Trying to use the old `hardware_transform` flag to combine the new large_fvf has lead to several bugs. So here the logic is broken out into 2 separate components, single item and large_fvf.
The old `hardware_transform` name also no longer makes sense, as there are now 3 transform paths:
Software (CPU)
Hardware (uniform)
Hardware (attribute)
- Fix objects with no material being considered as fully transparent by the lightmapper.
- Added "environment_min_light" property: gives artistic control over the shadow color.
- Fixed "Custom Color" environment mode, it was ignored before.
- Added "interior" property to BakedLightmapData: controls whether dynamic capture objects receive environment light or not.
- Automatically update dynamic capture objects when the capture data changes (also works for "energy" which used to require object movement to trigger the update).
- Added "use_in_baked_light" property to GridMap: controls whether the GridMap will be included in BakedLightmap bakes.
- Set "flush zero" and "denormal zero" mode for SSE2 instructions in the Embree raycaster. According to Embree docs it should give a performance improvement.
As part of the improvements to batch more cases, batching can store final_modulate as an attribute in the vertex format rather than sending as a uniform. This allows draw calls with different final_modulate to be batched together.
However custom shader code was reading from only the final_modulate uniform, and not the attribute when it was in use. This was leading to visual errors.
This is tricky to solve, because we cannot use the same name for the attribute in the vertex and fragment shaders, because one is an attribute and one a varying, whereas a uniform is accessible anywhere. To get around this, a macro is used which can translate to the most appropriate variable depending on whether uniform or attribute or varying is required.
GLES3 changes:
This commit makes it possible to disable 3D directional lights by using
the light's cull mask. It also automatically disables directionals when
the object has baked lighting and the light is set to "bake all".
GLES2 changes:
Added a check for the light cull mask, since it was previously ignored.
One of the new fvf types (FVF_MODULATED) allows batching custom shaders that use modulate. The only slight oversight is that there is a special define when MODULATE is used in a custom shader, called MODULATE_USED, that is checked, and if set it does NOT apply final_modulate as part of canvas.glsl.
This MODULATE_USED define wasn't checked when the new FVF was used and modulate was passed in an attribute.
This PR moves the application of the final_modulate into the #ifndef MODULATE_USED section.
The rendering/quality/2d section of project settings is becoming considerably expanded in 3.2.4, and arguably was not the correct place for settings that were not really to do with quality.
3.2.4 is the last sensible opportunity we will have to move these settings, as the only existing one likely to break compatibility in a small way is `pixel_snap`, and given that the whole snapping area is being overhauled we can draw attention to the fact it has changed in the release notes.
Class reference is also updated and slightly improved.
`pixel_snap` is renamed to `gpu_pixel_snap` in the project settings and code to help differentiate from CPU side transform snapping.
Due to multi pass approach to lighting in GLES2, in some situations the rendered result can look different if lights are presented in a different order.
The order (aside from directional lights) seems to be simply copied from the culling routine (octree or bvh) which is essentially arbitrary. While octree is usually consistent with order, bvh uses a trickle optimize which may result in lights occurring in different order from frame to frame.
This PR adds an extra layer of sorting on GLES2 lights in order to get some kind of order consistency.
This suppresses the blocky shadow appearance, bringing the shadow rendering
much closer to GLES3. This soft filter is more demanding as it requires
more lookups, but it makes PCF13 shadows more usable.
The soft PCF filter was adapted from three.js.
Valgrind reported two instances of reading uninitialized memory in the batching. They are both pretty benign (as evidenced by no bug reports) but wise to close these.
The first is that when changing batch from a default batch it reads the batch color which is not set (as it is not relevant for default batches). The segment of code is not necessary when it has already deemed a batch change necessary (which will occur from a default batch). In addition this means that the count of color changes will be more accurate, rather than having a possible random value in.
The second is that on initialization _set_texture_rect_mode is called before the state has been properly initialized (it is initialized at the beginning of each canvas_begin, but this occurs outside of that).
Move definition of rendering/quality/filters/anisotropic_filter_level to
servers/visual_server.cpp, since both GLES2 and GLES3 now use it
rasterizer_storage_gles3.cpp: Remove a spurious variable write (the
value gets overwritten soon after)
- Fix Embree runtime when using MinGW (patch by @RandomShaper).
- Fix baking of lightmaps on GridMaps.
- Fix some GLSL errors.
- Fix overflow in the number of shader variants (GLES2).
Completely re-write the lightmap generation code:
- Follow the general lightmapper code structure from 4.0.
- Use proper path tracing to compute the global illumination.
- Use atlassing to merge all lightmaps into a single texture (done by @RandomShaper)
- Use OpenImageDenoiser to improve the generated lightmaps.
- Take into account alpha transparency in material textures.
- Allow baking environment lighting.
- Add bicubic lightmap filtering.
There is some minor compatibility breakage in some properties and methods
in BakedLightmap, but lightmaps generated in previous engine versions
should work fine out of the box.
The scene importer has been changed to generate `.unwrap_cache` files
next to the imported scene files. These files *SHOULD* be added to any
version control system as they guarantee there won't be differences when
re-importing the scene from other OSes or engine versions.
This work started as a Google Summer of Code project; Was later funded by IMVU for a good amount of progress;
Was then finished and polished by me on my free time.
Co-authored-by: Pedro J. Estébanez <pedrojrulez@gmail.com>
Happy new year to the wonderful Godot community!
2020 has been a tough year for most of us personally, but a good year for
Godot development nonetheless with a huge amount of work done towards Godot
4.0 and great improvements backported to the long-lived 3.2 branch.
We've had close to 400 contributors to engine code this year, authoring near
7,000 commit! (And that's only for the `master` branch and for the engine code,
there's a lot more when counting docs, demos and other first-party repos.)
Here's to a great year 2021 for all Godot users 🎆
(cherry picked from commit b5334d14f7)
These were only put in for the betas, in order to test hypotheses for stalling on Macs. It seems that most of the problems in the Mac editor have been solved by fixing the excessive redraw_requests.
As a result no one has reported any results from these options, but in future we will be able to refer users to try the beta versions, so there is no need to include them in the stable release. Indeed they are only likely to cause confusion.
The root cause of the issue is that OpenGL ES 2 does not support the `textureCubeLod` function.
There are (optional) extensions to support this, but they don't appear to be exposed with the ES2 renderer (even though the hardware needed to support LOD features are certainly available.)
The existing shim in `drivers/gles2/shaders/cubemap_filter.glsl` just creates a macro:
```
#define textureCubeLod(img, coord, lod) textureCube(img, coord)
```
But the third parameter of `textureCube` is actually a mip bias, not an absolute mip level.
(And it doesn't seem to work regardless.)
In this specific case, the `cubemap_filter` should only sample from the first level of the "source" panorama cubemap.
In lieu of a method to force a lod level of zero, I've chosen to comment out the switchover from a 2D equirectangular panorama to the cubemap version of the same image, therefore always sampling roughness values from the 2D equirectangular panorama.
This may cause additional artifacts or issues across the seam, but at least it prevents the glaringly obvious black areas.
---
This same issue (no fragment texture LOD support) has rather large repercussions elsewhere too; it means materials with larger cubemap density (i.e. planar or distant objects) will be far rougher than expected.
Since GLES 3 appears to properly support fragment `texture*Lod` functions, switching to the GLES 3 backend would solve this problem.
---
Root cause discovered with help from @KaadmY.
Image::resize_to_po2() now takes an optional p_interpolation parameter
that it passes directly to resize() with default value INTERPOLATE_BILINEAR.
GLES2: call resize_to_po2() with interpolate argument
Call resize_to_po2() in GLES2 rasterizer storage with either
INTERPOLATE_BILINEAR or INTERPOLATE_NEAREST depending on TEXTURE_FLAG_FILTER.
This avoids filtering issues with non power of two pixel art textures.
See #44379
Lights with bake mode set to "All" were behaving erratically because of a
faulty check in the renderer. This should be the correct way to check if
a geometry instance is using baked light.
For fixing a previous issue state.canvas_texscreen_used was reset to false at the start of each render_joined_item. This was causing a later shader that used SCREEN_TEXTURE to force recapturing the back buffer immediately prior to use, which we don't want.
This PR preserves the state across joined items, and also prevents joining of items that copy the back buffer as this may be problematic.
It turns out that the original issue that needed the line is now fixed, and the later issue is also fixed by removing it.
While adding more debug checks to legacy renderer, I closed 2 types of vulnerabilities:
* TYPE_PRIMITIVE would previously read from uninitialized data if only specifying a single color
* Other legacy draw operations would fail in debug AFTER accessing out of bounds memory rather than before
Many calls to glBufferSubData are wrapped in a safe version which checks for out of bounds and exits the draw function if this is detected.
Large FVF allows batching of many custom shaders, but should not join items which have shaders that utilize BUILTINs which would change for each item, because these will not be sent individually, and all joined items would wrongly use the values from the first joined item.
This adds support for custom shaders for polys, and properly handles modulate in the case of large FVF and modulate FVF.
It also fixes poly vertex colors not being sent to OpenGL.
As a result of the GLES specifications being vague about best practice for how buffers should be used dynamically, different GPUs / platforms appear to have different preferences.
Mac in particular seems to have a number of problems in this area, and none of the rendering team uses Macs. So far we have relied on guesswork to choose the best usage, but in an attempt to pin this down, this PR begins to introduce manual selection of options for users to test their configurations.
In small batches using hardware transform, vertices would be drawn in incorrect positions due to the item transform being applied twice - once in the transform uniform, and once from the transform passed as a vertex attribute.
This PR alters the shader to ignore uniform transforms when using large FVF.
Due to my less than eagle-like view over these functions I had assumed they were passing in a single buffer input for the changes to make buffer uploading more efficient. They aren't, which is less than ideal.
So these particular changes should be reverted. When I have some more time I'll see whether the API for these calls can be changed, because as is the multiple glSubBufferData calls could be causing stalls on some hardware.
It can be enabled in the Project Settings
(`rendering/quality/filters/use_debanding`). It's disabled
by default as it has a small performance impact and can make
PNG screenshots much larger (due to how dithering works).
As a result, it should be enabled only when banding is noticeable enough.
Since debanding requires a HDR viewport to work, it's only supported
in the GLES3 backend.
This is part of effort to make more efficient use of the API for devices with poor drivers. This eliminates multiple calls to glBufferSubData per update.
Batching is mostly separated into a common template which can be used with multiple backends (GLES2 and GLES3 here). Only necessary specifics are in the backend files.
Batching is extended to cover more primitives.
Don't apply lighting to objects when they have a lightmap texture and
the light is set to BAKE_ALL. This prevents applying the same direct
light twice on the same object and makes setting up scenes with mixed
lighting much easier.
Option in MeshInstance to enable software skinning, in order to test
against the current USE_SKELETON_SOFTWARE path which causes problems
with bad performance.
Co-authored-by: lawnjelly <lawnjelly@gmail.com>
Fixes: #28683, #28621, #28596 and maybe others
For iOS we enable pvrtc feature by default for both GLES2/GLES3
Etc1 for iOS doesn't have any sense, so it disabled.
Fixed checks in export editor.
Fixed pvrtc ability detection in GLES2 driver.
Fixed pvrtc encoding procedure.
Normal mapping previously took no account of rotation or flips in any path except the TEXTURE_RECT (uniform draw) method. This passed flips to the shader in uniforms.
In order to pass flips and rotations to the shader in batching and nvidia workaround, a per vertex attribute is required rather than a uniform. This introduces LIGHT_ANGLE which encodes both the rotation of a quad (vertex) and the horizontal and vertical flip.
In order to optionally store light angles in batching, we switch to using a 'unit' sized array which can be reused for different FVF types, as there is no need for a separate array for each FVF, as it is a waste of memory.
In rare circumstances an item would issue multiple transform commands before a (non rect) draw command. The command syncronization would incorrectly start from first transform, instead of the current transform in these circumstances, which could have the result of missing drawing some commands from the end of the batch.
This had been shown in the wild occuring in debug collision polys. It was a benign error (sometimes visual elements would be lost), but did not cause any serious problems.
This PR fixes this synchronization error.
Compiler is usually in the best position to decide whether to inline functions. Great care must be taken using FORCE_INLINE because it can have unforeseen consequences with recursion, loops and bloat to the executable.
Here some FORCE_INLINES are removed in order to allow the compiler to make best choice and remove a compilation warning where unable to inline during a recursive function.
Fixes#41226