1. motivation
irregular asynchronous setting:
- the time stamps of the collaboration messages from other agents are not aligned
- the time interval of two consecutive messages from the same agent is irregular
Problem formulation:
-
: -th timestamp of agent
-
:perception evaluation metric,which is used to compare the perception and the ground-truth perception
-
: the perception of agent at time
-
: the raw observation of agent at time
-
: the collaboration message sent from agent at time
-
The perception can be obtained by the function leveraging the raw observation from itself at and the stored k frames of collaboration message from other cars.
standard well-synchronized setting:
- for all agent’s pairs
- is a constant for all agents
regular asynchronous setting (SyncNet):
- cant be guaranteed
- is a constant all agents
irregular asynchronous setting:
- cant be guaranteed
- is irregular
2. contribution
- IRregular V2V(IRV2V),: the first synthetic asynchronous collaborative perception dataset with irregular time delays, simulating various real-world scenarios
- CoBEVFlow, an asynchrony-robust collaborative perception system based on bird’s eye view (BEV) flow
3. details
3.1 Overall
3.2 Preparation for transmission
-
is the BEV perceptual feature map of agent at time , with H, W the size of BEV map and D the number of channels.
-
:the sparse version of ,which only contains features inside and zero-padding outside;
-
: is the set of region of interest (ROI)
3.21 the generation of ROIs,
: the ROI generation network with detection decoder structure
: one detected ROI with its class confidence, position, size, and orientation
We threshold the class confidence, apply non-max suppression, and obtain a set of detected boxes, whose occupied spaces form the set of ROIs .
3.3 Collaborative information alignment
3.31 BEV flow map generation
$$ \mathbf{M}_m^{t_m^j\to t_n^i} =f_{\mathrm{flow\_gen}}(t_n^i,\{\mathcal{R}_m^{t_m^q}\}_{q=j-k+1,j-k+2,\cdots,j}) $$* Adjacent frames’ ROI matching
The goal is match the ROIs in two consecutive messages sent by the same agent so that we can track each ROI’s multiple locations across frames .
3 steps:
- cost matrix construction based on the distance between the ROI
- greedy matching: find the nearest ROI
- post-processing:
* BEV flow estimation
be a historical sequence of the -th ROI’s attributes sent by the -th agent
is the location and orientation
We need to estimate (Time-Series Forecasting )
- encoding timestamp
Finally, we can generate BEWFlow map. we calculate the motion vector at each grid cell by an affine transformation of the associated ROI’s motion, constituting the whole BEV flow map .
3.32 Feature warp and aggregation
$$ \widehat{\mathbf{F}}_m^{t_n^i} =f_{\mathrm{warp}}(\widetilde{\mathbf{F}}_m^{t_m^j},\mathbf{M}_m^{t_m^j\to t_n^i})\\ \widehat{\mathbf{H}}_n^{t_n^i} =f_{\mathrm{agg}}(\widetilde{\mathbf{F}}_n^{t_n^i},\{\widehat{\mathbf{F}}_m^{t_n^i}\}_{m\in\mathcal{N}_n}) $$
is the realigned feature map from the -th agent’s at timestamp after motion compensation
is the aggregated features from all of the agents
4. experiment
5. the dataset IRV2V
The ideal sample interval of the sensor is 100ms
- There is a time offset at the sampling starting point of non-ego vehicles
- all non-ego vehicles’ collaborative messages are sampled with time turbulence
so that cant be guaranteed
we sample the frame intervals of received messages with binomial distribution to get random irregular time intervals