“I think they mostly use the internet. The browser shows your OS to websites via User Agent.”
That would let the cat out of the bag. How would these websites be able to accurately count individual unique devices aside from just general OS type? OS yes, number of unique devices for sure? They are not supposed to have that intrusive capability right?
But even with webserver stats it could not be very accurate. Not everyone goes to the same sites... And not all sites contribute that data or participate in the survey. If they are getting numbers from Google they are not going to be accurate for Linux either. Most Linux users are intelligent enough to avoid Google and use script blockers that block Goggle services on sites.
The OP link — https://gs.statcounter.com/os-market-share//
https://statcounter.com/how-it-works/
https://gs.statcounter.com/factsheet
This SAAS — https://statcounter.com/ — strictly uses websites/browsers for all of it’s stats. Since the SAAS works by website owners placing a bit of code on their page, statcounter scrapes the data from all their SAAS customers and combines it to put out their overall stats like the OP OS usage subject.
As far as IDing individuals, they use cookies. If you clear your cookies and visit again, you’ll get counted twice. Think of it like polling. They’re using stats from a small percent of WWW users. Is it 100% accurate? No, just like polling isn’t.