You are correct. It's just a shadow caused by the last person's cell phone and the corner of the fense, projected upward at the tree on the high opposite side of the fence.
You can see in the video as the kid moves toward the opening in the fense, holding his phone up, that the light projects a shadow upward toward the tree, and as he moves past the fence line, that is when the shadow (that people are calling a head) appears to move away (to the right).
That also explains why there appear to be shadows moving between the fence slats.
Now, why is it all foggy looking, instead of shadow lines on the tree branches?
If you have ever watched much compressed digital video, such as the types you see from cell phones, you will see lots of blurry areas, or artifacts depending on the compression type, where movement occurs. This is because of the anture of the way that video compression works. It indexes colors and patterns for objects that have previously been visible in previous frames, and it re-uses those patterns so that they don't have to be re-streamed again in the new frames. Because of that, objects that are still or moving relatively slowly will have the best resolution and clarity, and objects that are moving fast will have the worst.
The tree branches that had relatively stagnant changes in light and color were quite discernible to the eye, but when the kid moved past the fence with the bright light, the shadows projected onto the tree got momentarily blurred by compression.