These last weeks I have been working on having a stable tracker which allows the application a better synchronization between the different branches (Camera, Net, GUI) and the tracking itself. For that purpose, the internal logic of the tracker was modified, allowing the tracking to work in a more flexible way with the buffer which takes as input. This gives as result a tracker that has 3 modes: slow, normal and fast (depending on the FPS average rate of tracking of the previous frames). For example, if the tracking is running slow, a number of frames in the buffer are skipped to avoid the buffer to grow more than expected. And, if the tracking is running fast, the tracker slows down to prevent that the tracking finishes before the neural network gives a result.
Related to the multiobject tracking problem (the tracking of multiple objects affects the FPS rate, slowing it) I looked for new trackers following these posts (LearnOpenCV: https://www.learnopencv.com/multitracker-multiple-object-tracking-using-opencv-c-python/ pyimagesearch: https://www.pyimagesearch.com/2018/07/30/opencv-object-tracking/). I am actually working with TLD tracker but it has some problems with false positives. However, it is the best option available for my purpose in the version 3.3.1 of OpenCV (included in JdeRobot). One of the next steps is to test the MOSSE and CSRT trackers that are available in more recent OpenCV versions and look promising, specially MOSSE due to the speed requisites.
At the same time, I am going to include the JdeRobot object detector (https://github.com/JdeRobot/dl-objectdetector) to make use of other networks apart from the actual Mask R-CNN.