Just saying down the road when we look back, todays LLM models will pale in comparison.
I understand what you are just saying, but I wonder how long it will take for this comparison to be relevant. And I do have experience using models from different generations. The output is often not that much different.
Functionally how much better do cars from today perform than cars from 100 years ago? I am currently working on an 84 year old 1942 Cadillac. It has a 150HP V-8 engine with an automatic transmission. It drives at 70 mph with power to spare and gets fuel economy that is only slightly worse than our 2013 Ford Explorer.
Important incremental advances will be made but until there are revolutionary developments that cause the models to approach human intelligence the differences in the quality of output will not be that noticeable.
I use LiteLLM API to run nearly all of the available commercial models in Open WebUI. This is a pay as you go solution which is typically cheaper than getting a paid monthly subscription to the top models. Using that solution I can pick from older models with different capabilities. I often pick an older or less capable model that is a 10th or a 20th of the price because the output is typically not much different. I also run many of the same open-source models which are released after a year or two locally because they provide a very similar experience for nothing.
Coding or writing scripts or even a sequence of commands to set up a VM or local hardware is usually the only application where a much more expensive model is worth it.
You can make this comparison by opening more than one model at a time and comparing the output. Here is a video that tells how to do this with very little effort. You can use a computer or VM using Linux or even a Windows computer using Windows Subsystem for Linux. (WSL)