I'm trying to create a Newton particle simulation using compute shaders. In order to visualize it I calculate every particle position and project it onto a RenderTexture. There are several examples online that manage to simulate millions of particles at the same time, but I can only go as far as a few hundred thousand without seeing a big decrease in performance. For limit testing purposes I ran a separate simple(empty) compute shader program. Obviously, that one performed better, but still nowhere near the millions of times a frame. My fps dips to around 100 when 'simulating' 2 million times. And the shader doesn't even do anything yet. My GPU (RTX 2060 super) is also better than the ones used online; with those using way more particles.
This is my test compute shader:
#pragma kernel Spam RWStructuredBuffer test;
[numthreads(64, 1, 1)]
void Spam (uint3 id : SV_DispatchThreadID)
{
//test[id.x] = id.x;
}
And the program that runs it:
public class ShaderSpamTest : MonoBehaviour { public ComputeShader spamShader; public int spamAmount; int[] spamArray; ComputeBuffer spamBuffer; uint threadGroupSize; int threadGroups; private void Start() { spamArray = new int[spamAmount]; spamBuffer = new ComputeBuffer(spamAmount, sizeof(int)); spamShader.GetKernelThreadGroupSizes(0, out threadGroupSize, out _, out _); threadGroups = (int)((spamAmount + (threadGroupSize - 1)) / threadGroupSize); } void FixedUpdate() { spamBuffer.SetData(spamArray); spamShader.SetBuffer(0, "test", spamBuffer); spamShader.Dispatch(0, threadGroups, 1, 1); spamBuffer.GetData(spamArray); } private void OnDestroy() { spamBuffer.Dispose(); } } I have a strong feeling it has something to do with my kernel and thread group dimensions/values. My original Newton shader only requires me to loop through a 1-dimensional array so I assumed the y and z values could be set to 1. I have tried many combinations but no combination seemed to even slightly change the performance. I feel like I am missing something. This is the first time I am writing compute shaders so my knowledge is still very limited.
This is my test compute shader:
#pragma kernel Spam RWStructuredBuffer
public class ShaderSpamTest : MonoBehaviour { public ComputeShader spamShader; public int spamAmount; int[] spamArray; ComputeBuffer spamBuffer; uint threadGroupSize; int threadGroups; private void Start() { spamArray = new int[spamAmount]; spamBuffer = new ComputeBuffer(spamAmount, sizeof(int)); spamShader.GetKernelThreadGroupSizes(0, out threadGroupSize, out _, out _); threadGroups = (int)((spamAmount + (threadGroupSize - 1)) / threadGroupSize); } void FixedUpdate() { spamBuffer.SetData(spamArray); spamShader.SetBuffer(0, "test", spamBuffer); spamShader.Dispatch(0, threadGroups, 1, 1); spamBuffer.GetData(spamArray); } private void OnDestroy() { spamBuffer.Dispose(); } } I have a strong feeling it has something to do with my kernel and thread group dimensions/values. My original Newton shader only requires me to loop through a 1-dimensional array so I assumed the y and z values could be set to 1. I have tried many combinations but no combination seemed to even slightly change the performance. I feel like I am missing something. This is the first time I am writing compute shaders so my knowledge is still very limited.