[D] Is it common to discard all ‘non-full-time-trajectory’ pedestrian from data in pedestrian prediction domain?
Hello, I’m relatively new to pedestrian trajectory prediction, and I have a question about the preprocessing of pedestrian data. Actually, I sent the question e-mail for the author of SocialGAN paper and no reply was received (yet). If anyone knows anything about this topic, your answer would help me a lot.
It is quite common that may ML pedestrian trajectory prediction employs LSTM in order to capture the temporal correlation. For example, look at the diagram below from the paper ”Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks’ (https://arxiv.org/pdf/1803.10892.pdf ). In the paper, the author said that the model was trained to observe 3.2 seconds (8f) then predict 4.8 seconds (12f).
However, since the common pedestrian dataset (like ETH and UCY) collected from a single still camera, trajectory terminates when the agent moves out of sight. Thus, unlike the conceptual diagram in many papers, we cannot simply gather whole data for 3.2sec (8f) and let them consist of LSTM hidden state.
Therefore, I’ve curious about how other researchers handled this missing value problem.
While following the author’s code on GitHub, I’ve found that the code discarded every individual when their trajectory throughout the given interval is shorter than its whole interval length. (https://github.com/agrimgupta92/sgan/blob/master/sgan/data/trajectories.py, an excerpt from class ‘TrajectoryDataset’)
In result, the final dataset for training does not contain a nonnegligible portion of trajectories despite that some of them could exist near to ‘considered’ individuals and may affect their trajectories.
I want to raise two questions on this point.
– Am I correctly understanding the code and its consequences? I would like to show a small toy example in order to visualize and verify my understandings of SocialGAN code.
Let us assume that seq_len = 4 (obs_len = 2, pred_len = 2) and consider the frame 1~4. Then, according to my understandings, SocialGAN code discards green agent(below) because it does not have a full 4-frame trajectory.
But, in my opinion, I think the green agent would affect red agent’s trajectory a lot compared to blue one does. Although the missing value of the trajectory is an inevitable problem of the dataset itself (since no one can collect an indefinite amount of tracked position), ignoring the whole trajectory because of those value might decrease the overall accuracy.
Furthermore, since this preprocessing handles frame-based trajectory data with a moving window (i.e. it makes frame 1~4 as a first data and 2~5 as second, 3~6 as third, and so on.), an agent like green might affect red agent in some data, while discarded in other data2.
– Secondly, (under the assumption that my understanding of the situation is correct), Is this kind of preprocessing (discarding non-full-trajectory individuals) common for preprocessing of the public pedestrian datasets such as ETH and UCY, or is this preprocessing was particular for SocialGAN and there are other ways to preprocess pedestrian data? I’d like to know that is there another way to deal with this missing value problem.
I’m currently trying to implement a model and preprocessing which can alleviate this problem, but I’m not certain whether this problem had been already addressed and well-known in CV/ML society, or it hadn’t.
Again, any response or shared opinion will be grateful 🙂