PR #90993 added several debugging utilities.
Among them, advanced memory tracking through the use of custom
allocators and VK_EXT_device_memory_report.
However as issue #95967 reveals, it is dangerous to leave it on by
default because drivers (or even the Vulkan loader) can too easily
accidentally break custom allocators by allocating memory through std
malloc but then request us to deallocate it (or viceversa).
This PR fixes the following problems:
- Adds --extra-gpu-memory-tracking cmd line argument
- Adds missing enum entries to
RenderingContextDriverVulkan::VkTrackedObjectType
- Adds RenderingDevice::get_driver_and_device_memory_report
- GDScript users can easily check via print(
RenderingServer.get_rendering_device().get_driver_and_device_memory_report()
)
- Uses get_driver_and_device_memory_report on device lost for appending
further info.
Fixes#95967
Features:
- Debug-only tracking of objects by type. See
get_driver_allocs_by_object_type et al.
- Debug-only Breadcrumb info for debugging GPU crashes and device lost
- Performance report per frame from get_perf_report
- Some VMA calls had to be modified in order to insert the necessary
memory callbacks
Functionality marked as "debug-only" is only available in debug or dev
builds.
Misc fixes:
- Early break optimization in RenderingDevice::uniform_set_create
============================
The work was performed by collaboration of TheForge and Google. I am
merely splitting it up into smaller PRs and cleaning it up.
Before this change, a skeleton that was not updated every frame would
result in a difference of 2+ between last_change and frame index every
frame, which would disable the buffer rotation and set motion vectors to
zero. This results in significant visual artifacts for FSR2 that are
especially prominent on the characters that move together with the view
such as the main character in third person mode.
This is a significant problem for high refresh rate displays: at 120 Hz,
we are effectively guaranteed to skip skeleton updates every other frame
with skeleton update happening during physics processing, and the lack
of physics interpolation for skeletons. This happens by default in TPS
demo when FSR2 is enabled.
In other places where motion vectors are disabled, such as multi-mesh
and mesh rendering (where previous transform is updated), the logic
effectively allows for a single-frame gap in updates, because it
compares the frame where the update happened (which is the current frame
if updates are consistent) with the current frame, so the latency of 0
means "update just happened", but both multi-mesh and mesh transform
updates permit a latency of 1 as well.
Here, however, last_change is updated *after* the frame processing has
concluded, so a zero-latency update has a distance of 1. Allowing a
distance of 2 (latency 1) reduces the severity of the problem and aligns
the logic with transform updates.
Note that the problem will still happen when refresh rate is noticeably
higher than physics rate times 2. For example, it still happens at 240
Hz. However, a longer latency allowance is inconsistent with other
transforms and could lead to issues, so ideally long term physical
interpolation of skeleton transforms would completely solve this.
Before this change, using FSR2 resulted in the following error when the
effect was destroyed:
ERROR: Attempted to free invalid ID: 662734928609453
at: _free_internal (servers/rendering/rendering_device.cpp:4957)
This happened because ACCUMULATE and ACCUMULATE_SHARPEN passes shared
the same shader_version object but had different pipeline IDs. When
version_free was called for ACCUMULATE pass, it destroyed pipelines
created from that version, including the pipeline for the
ACCUMULATE_SHARPEN pass.
Using a unique version could work around this problem, but it's easier
to rely on version_free destroying the created pipelines through the
dependency mechanism.