You’re right, I lapsed. The BOM helps to designate byte order — duh...
Still yet, how does one know if the stream is ANSI or UTF-8 until things get weird? At least with UTF-16+, the ANSI stream dies after the first character.
The server informs the browser how the page is encoded via the Content-Type header, e.g.,
Cache-Control:private Connection:close Content-Type:text/html; charset=utf-8 Date:Tue, 19 Jan 2016 16:45:16 GMT Server:nginx/1.2.4 Transfer-Encoding:chunked
There are other possible values for the charset, but UTF-8 is taking over, because it just works.
It’s not hard—if the first bit is set, it’s a multibyte character.