Audio analysis of the viral Araria video with alleged Pro-Pak slogans raises suspicions

ARARIA: After BJP’s loss in Araria, Bihar, in the bypoll elections, a video has become viral which is allegedly from Araria and where it is being claimed that supporters of the winning candidate Sarfaraz Alam are raising pro-Pakistan slogans. The video can be seen below.

Even a casual look a the video will make it apparent that there are serious issues with lip sync in the video. Alt News decided to dig deeper into this and do an audio analysis of the above video. We examined this video from 3 independent sources.

Suspect Audio Waveform

We analysed the three videos using an audio editing software called Audacity. What is observed consistently across all the three videos is that in two spots in the video, the volume or the audio level is ZERO (circled).

The above two spots with zero audio level are roughly between

1) 0.000 and 0.140 seconds
2) 18.500 and 19.400 seconds

An expanded version of the waveform where audio levels are ZERO can be seen below.

The same lack of volume can be seen when the audio is analysed using Spek which is an acoustic spectrum analyzer software. The image generated by Spek can be seen below. The two distinct spots with zero audio level is even more apparently in the waveform generated by Spek.

If you listen to the video, you’ll hear a sound of a vehicle in the background, one which resembles a tuktuk. There is also other kinds of commotion going on in the background that is audible. However, all that background noise becomes ZERO at the above mentioned time intervals which raises the initial suspicions about this video and its authenticity.

There is no audio-video sync

As we mentioned earlier, the very first observation about the video was that it didn’t seem to have lip sync. To ascertain whether the audio and video are in sync, we examined individual frames from the video against the audio waveform.

For those who are interested in knowing how to break down a video into individual frames, we used ffmpeg which is a cross-platform video editor. The following highlighted command will break down the video into individual frames at a rate of 30 frames per second and add a timestamp to the top left. The timestamp reflects the position of an individual frame with respect to the start of the video.

A zip file with all the frames obtained using the above command can be downloaded here.

From the above images, we created various collages to see the lip positions of the subjects in the video. The following collage has frames from 3.100 seconds to 3.767 seconds of the video.

As can be seen in the above collage, only the person on the right has his mouth visibly open while the other two people have their mouth shut. The person on the right has his mouth visibly open from 3.200 seconds to 3.700 seconds. If the audio and video are in sync, the audio waveform should show increased audio levels during this period. Does it?

The audio waveform shows almost nil audio levels in the duration 3.250 seconds to 3.500 seconds. This shows that while the subject on the right had his mouth open and is visibly exulting, the audio for the corresponding period doesn’t match and shows close to nil audio levels. This shows that the audio and video are not in sync.

The above observations necessitate that the video be examined by a certified forensic laboratory for its authenticity. With so many easy-to-use software tools available which allows one to dub an audio over an existing video, news channels or journalists shouldn’t be propagating such videos without doing an independent analysis of their own.

Videos like the one above can cause friction between communities of an area and can be used as part of a political agenda. It is therefore of utmost importance that when these videos become viral, extra care must be taken before mainstream media further popularises such videos.

Two persons have been reportedly arrested with regards to the above video and we hope that police investigation will throw further light on the authenticity of the video. The audio analysis of the videos circulating on social media raise several red flags and hence a detailed forensic analysis of the video should be done.

