Posted on 11/28/2025 6:45:55 PM PST by CharlesOConnell
There were an older era of "analog" audio recordings, produced in the old "waveform", analog audio computer application program, "Audacity". They were functionally supported by a "metronome", a "click track" or a "rhythm track". With a lot of concentration and some luck, a performance could be produced that would satisfy pianists, critics and teachers. The metronome needed to be used to set the basic tempo, but if it were doubled in speed to correspond with the smallest "note" value in a piece (an eighth note in a 6/8 example piece with longer/bigger quarter notes and dotted quarter notes, a typical piece with 3 levels of rhythmic duration), setting the tempo double-fast would be distracting and against the purpose—you could use it to provide for accuracy support for the fastest/smallest note value in the piece, but you would be disoriented at the level of the slowest/largest note values—difficulty "seeing the forest for the trees".
A usable solution to the problem described, involves performing the piece as "data entry" into a digital audio workstation (DAW) in MIDI mode. The clunky, artificial "computer" sound of MIDI of yesteryear has been moved past, because audio recordings of all the sounds and articulations of a musical instrument, or music section, even up to a complete orchestra, have been bridged over to computer packages called "VST patches" that are capable of making the artificial MIDI output, sound realistic and warm. (Virtually all movie music is now produced this way). The program that handles the MIDI, the DAW, has easy-use controls to snap notes to a grid (Cubase: Quantize), all the 5 or 6 parameters of musical sound, like loudness, pitch/flubs, duration and how the blank spaces between notes line up with staccato or smooth, voice-like note-to-note legato, can be used with practicality to efficiently correct a recording, whereas analog-audio "waveform" recordings have severely limited functionality in this regard.
The issue described here is, how can older, non-MIDI, "analog" audio performances be converted to MIDI and corrected?
At the link-click in the top of the article, is an example audio-file, performance piece to illustrate this problem, known as "Drink to Me Only With Thine Eyes" by the poet of the monarchy of James I of England, Ben Jonson. (I don't know who composed the music, where it is generically described in the sheet music as "Old English Air", it is known as the tune "I Heard the Voice of Jesus Say". You can hear a production from about 1960 by Johnny Cash, in a kind of Country-Calypso beat, released as Drink to Me.) In the sheet music, the issue, the question comes up about an "articulation" error in the second line, the second measure, the fifth of six notes, the second note on the word "cup". The error is, the key is struck too soon, marring the performance.

The core of the issue is, what can music software be expected to do?
Question to ChatGPT: The 27th note in the piece has premature execution, "too soon". It is the kind of issue that could easily be corrected in a MIDI file in a DAW. My question is, what are near-term, future developments in computer application program development, to the effect of allowing sound files such as those in an MP3 format, easily to be rendered as MIDI files?
Reply: The it touches a major frontier in music-technology development. A realistic overview of where things stand today — and what to expect soon — regarding converting MP3 (audio) into MIDI with enough precision to fix issues like “the 27th note came too soon.”
You’re asking about the near-term future of tools that will let you take a performance in MP3 format and convert it into an editable MIDI file, so that issues like a single premature note (e.g., “the 27th note is too early”) can be corrected easily in a DAW.
For a case like your 27th note being too early, these tools can help, but you still typically need to fix the timing manually in a DAW after converting to MIDI.
Summary of the present: Audio-to-MIDI exists and is evolving, but is not yet at the point of “effortlessly fix one bad note in an MP3” without some manual work.
Based on the current trajectory of music AI and DAW development, the next few years are likely to deliver tools that do almost exactly what you’re hoping for.
We can expect tools that will:
The goal of these systems is exactly what you described: “Correct the performance to the written rhythm.”
Future versions of major DAWs (such as Cubase, Logic, and Studio One) are very likely to include:
In practice, this may mean you will be able to:
If current deep-learning trends continue, within about five years we can reasonably expect:
In other words, the workflow of Audio → Score → Editable MIDI, with enough accuracy to edit individual notes, is very likely to become a standard and reliable feature.
Given currently available tools, here’s the most realistic path:
This isn’t yet a “one-click fix”, but it is currently the closest practical approach to your desired workflow.
Dear FRiends,
We need your continuing support to keep FR funded. Your donations are our sole source of funding. No sugar daddies, no advertisers, no paid memberships, no commercial sales, no gimmicks, no tax subsidies. No spam, no pop-ups, no ad trackers.
If you enjoy using FR and agree it's a worthwhile endeavor, please consider making a contribution today:
Click here: to donate by Credit Card
Or here: to donate by PayPal
Or by mail to: Free Republic, LLC - PO Box 9771 - Fresno, CA 93794
Thank you very much and God bless you,
Jim
Dvorak, Mendelssohn, Tchaikovsky, Haydn, Schubert, Barber, Wagner, Bach, Beethoven, Mozart. https://wp.me/p256FR-1Za
“At the link-click in the top of the article.”
-
The link goes straight to the audio clip.
Manually adjust any specific note (for example the 27th note) in the MIDI editor.
So for us lay people, it sounds like the goal here is not to provide additional clarity/fidelity to the original performance, but rather to correct the "mistakes" made by the human performers in the original recording.
Is that correct?
Because if so, that sounds sort of like a more sophisticated version of auto-tune, except for instrumentation.
The revival of real music— not contrived AI crap, exacting perfection which would demand “correction” of say, Jascha Heifetz!! The start of the destruction of AI begins with such nonsense as the need for “remediation” (of course this topic is about recording engineers, so there is “that”. Reduce the labor on this one... AI.
Here AI, perfect THIS-—! Tchaikovsky Violin Concerto First Movement:
https://www.youtube.com/watch?v=kFaq9kTlcaY
AI has many great applications, but, I have ZERO interest in listening to “corrected” music performances. The little errors or deviations are what very often add to a performance’ personality.
The Flight of the Bumblebee and Zorba the Greek are couple very technical pieces that would be interesting to hear executed using MIDI.
Does MIDI even have a clue as to what rubato is?
I’m not interested in hearing the human artistry stripped out of musical pieces by some stupid algorithms.
Any digital audio recordings can’t match the pure simplicity of analog music. Digitally enhanced means manipulation of actual sounds.
Most studios would digitize the piano track and then fix the bad note using ProTools or some other recording/editing software. The performance would keep its natural feel and the edit would be seamless. I doubt they’d do anything with MIDI.
I had a friend who was the chair of the music department at a college. Somehow, he had managed to go his entire life without hearing Vince Guaraldi’s rendition of Linus and Lucy. He bought the sheet music but wasn’t playing it correctly. I, who can’t even read music, had to show him. Some things just don’t manifest onto sheet music, and that probably goes double for jazz. He ended up playing the piece in his class, and he said the students just went nuts. It increased his “cool” factor with his students.
I just like listening to music - sometimes a sight “flaw” (instrumental or vocal) makes it better than robotic perfection...of course, I used to drink Boones Farm and Bali Hai too.
Being experienced in electronics, it is common knowledge that when you increase the rate of timing (sampling) signals, you get truer representations of what the original is.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.