Some compilers are attempting to automatically generate code from common languages like "C" to take advantage of parallel execution.
On Macs, we can write SIMD vector processing code for G4 and G5 processors using "C"-type languages. And we can use multiple CPUs, so coarse-grained and fine-grained parallelism are available together.
A few specialized programming languages exist to better utilize parallelism. Some time ago, I learned a language called "Occam" for a processor called the "Transputer". The language was simple, but designing code that avoids deadlocks, livelocks, race conditions, etc. requires a lot of effort. Computers could contain multiple Transputer chips, and Occam would automatically distribute the processing and handle the communications between each chip. If I recall correctly, each Transputer had four built-in serial communication paths, so grid topologies could be easily designed.