Lies, damned lies and statistics...
If you do 20 tests to the 95% significance, one would be expected to report as true (null hypothesis rejected) even when the null hypothesis is true.
That is why a single test is meaningless, and can be manipulated. When different experimenters repeat the test, and find the same result, you then can have confidence.
Nothing is settled, ever. Even if the value of Pi started reporting differently (due to a change in the nature of the cosmos) then the current value of Pi would be studied, and the change reported.
Science doesn’t tell you what is true. It is a good way of finding out what is not true.