ffmpeg -i big_buck.mp4 -i big_buck.mp4 -i big_buck.mp4 -i big_buck.mp4 -i
big_buck.mp4 -filter_complex
[v1][v2]overlay[v12]; [v12][v3]overlay[v123]; [v123][v4]overlay[v1234]; [v1234][v5]overlay,format=yuv420p[v]"
-map [v] result.mp4
The PTS has to be modified so that each new clip starts 1 second before the current combination of clips ends i.e. the 3rd clip should start fading in at 8 seconds, since the combination of the first two clips is 9 seconds (4 seconds of first clip + 1 second transition + 4 seconds of 2nd clip).
You don't need the fade out as the next clip is fading in on top. The concat is only required if you want a cut.
With audio crossfades:
ffmpeg -i big_buck.mp4 -i big_buck.mp4 -i big_buck.mp4 -i big_buck.mp4 -i
big_buck.mp4 -filter_complex
[v1][v2]overlay[12]; [12][v3]overlay[123]; [123][v4]overlay[1234]; [1234][v5]overlay,format=yuv420p[v];
[1][2]acrossfade=d=1[a12]; [a12][3]acrossfade=d=1[a123]; [a123][4]acrossfade=d=1[a];"
-map [v] -map [a] result.mp4