A useful way to think about FSM's, as opposed to more powerful parsing devices, is that an FSM knows instantly when it has succeeded or failed, without the aid of a memory store.
Here's my favorite chart on the subject:
Language Class -- Chomsky level -- recognition automata recursively enumerable -- 0 -- Turing Machine context-sensitive -- 1 -- linearly bound TM context-free -- 2 -- push down automata regular -- 3 -- Finite State MachineTokenizers are generally FSM's/regular grammars, commercial compilers are generally PDA's/CFG's, and semantic processing is, well, beyond my ken, formally speaking.
BTW, have you ever written a compiler or interpreter?