Well, I gave it another try and used weighted mix of different metrics and I got pretty descent results. I got rid of the sudden flips of objects. Since I track similar objects their histograms tend to be close using any distance metric. Not using the histogram at all felt wrong so I added it but with a lower weight coefficient.