Absolutely. The real bottleneck for silicon simulation of such systems is that cache coherence requirements and core latency massively degrade the actual performance. It is the nature of the algorithm space. So rather than building massively parallel silicon to do it, we single-thread it and run it at very high speeds to maximize real-world throughput because that is the easiest way to make it scale on silicon. The brain may process at very slow speeds, but the massive parallelism is such that current processors still can't simulate that number of slow processes in real-time.
This particular problem space underscores just how slow the main memory is on our computers and how narrow the bandwidth is to the actual processors. I've been involved in some research on elegantly scaling that domain to multiple processor systems so that the you actually get a net increase in scalability (most naive attempts will actually make these run SLOWER with multiple processors due to the memory problems mentioned above).