LittleBug

[CVPR 2023]VAD: Vectorized Scene Representation for Efficient Autonomous Driving

发表于2024-11-27|paperReadingend2end

Introduction 这篇文章刷新了端到端自动驾驶领域的sota （2023）。 traditional autonomous driving methods采用模块化模式，其中感知和规划被解耦为独立的模块。其缺点是规划模块无法访问原始传感器数据，这些数据包含丰富的语义信息。由于规划完全基于先前的感知结果，因此在规划阶段，感知中的误差可能会严重影响规划，无法被识别和纠正，从而导致安全问题。而端到端自动驾驶方法（end-to-end autonomous driving methods）将传感器数据作为感知的输入，并通过一个整体模型输出规划结果。这篇文章提出了Vectorized Scene Representation for Efficient Autonomous Driving（VAD），即将所有的驾驶场景建模为矢量表示，一方面，VAD利用矢量化的智能体运动和映射元素作为明确的实例级规划约束，有效地提高了规划安全性；另一方面，VAD通过摆脱计算密集型的栅格化表示和手工设计的后处理步骤，比以前的端到端规划方法运行得快得多。具体来说，VAD-Base,...

Auto-Encoding Variational Bayes

发表于2024-11-22|paperReadingdiffusion|inference•variational inference

Problem scenario 已知隐变量的先验分布和条件生成分布以上背景下的相关问题有： Preliminary evidence lower bound (variational lower bound) 推断（inference）可以理解为计算后验分布P(Z∣X)P(Z|X)P(Z∣X), P(Z∣X)=P(X,Z)∫zP(X,Z=z)dzP(Z|X)=\frac{P(X,Z)}{\int_z{P(X,Z=z)}dz} P(Z∣X)=∫zP(X,Z=z)dzP(X,Z) 其中分母（规范项）很难计算，所以精确计算后验分布很困难，常常有两种方法求解近似的后验分布。采样法：例如MCMC，MCMC方法是利用马尔科夫链取样来近似后验概率，它的计算开销很大，且精度和样本有关系。变分法：使用一个简单的概率分布来近似后验分布，于是就转换为一个优化问题 KL divergence: DKL(q∣∣p)=Ex∼q[log⁡pq]=∑xq(x)log⁡p(x)q(x)D_{KL}(q||p)=E_{x\sim...

Git的一些经验

发表于2024-11-15|environment-construction|github

连接问题由于默认的git push，git pull，git clone使用的是http连接，则我们可以修改git的http连接方式，通过代理服务器来连接GitHub 可以采用代理服务器的socket端口访问github git config --global http.proxy socks5 127.0.0.1:10808git config --global https.proxy socks5 127.0.0.1:10808 或者采用代理服务器的http代理来访问github git config --global http.proxy 127.0.0.1:10809git config --global https.proxy 127.0.0.1:10809 这里 127.0.0.1表示本地服务器，而由于一般来说代理服务器都安装在本地，10808和10809这些端口号为我们监听的代理服务器端口，这两个参数可以在代理服务器软件中看到。使用如上git全局配置后，我们的push,pull,clone 都默认通过代理服务器推收。可以使用如下代码来取消全局配置 git...

hexo自动部署到githubPages和vercel

发表于2024-11-14|hexo|github

准备首先本地有一份博客源码，然后github上面要有两个仓库:hexo-source和xxxx.github.io.git。还需要一份密钥，用来链接github仓库，密钥可以是github token，也可以是ssh密钥。 github token 是用来以https链接仓库；ssh密钥是用来以ssh链接仓库，分为私钥和公钥，私钥放到本地，公钥放到github。 hexo-source仓库用来备份本地源码，将其设置为 private (毕竟，我不想其他人直接git clone就把我的博客系统抄袭了)。 .gitignore中的文件不需要备份，因为其中都是一些环境依赖，还有发布后的代码。 .DS_StoreThumbs.dbdb.json*.lognode_modules/public/.deploy*/_multiconfig.yml xxxx.github.io.git是github Pages仓库，它一定是 public 的，使用hexo...

pytorch中的自动微分

发表于2024-11-10|pytorch|pytorch•自动微分

自动微分我们知道在pytorch是支持自动微分的，也就是自动求导数，在深度学习框架中，我们一般会求loss函数关于可学习参数的偏导数。 import torchx = torch.arange(4.0)# x=tensor([0., 1., 2., 3.])x.requires_grad_(True) # 等价于x=torch.arange(4.0,requires_grad=True)x.grad # 默认值是None 如果我们将来需要计算关于某个变量...

diffusion基础

发表于2024-11-10|paperReadingdiffusion|diffusion

生成模型对比 GAN网络由discriminator和generator组成，discriminator致力于区分x和x‘，而generator致力于生成尽可能通过discriminator的样本，迭代多次，最终generator生成的样本越来越像x，即我们需要的生成式样本。 VAE是学习分布函数的网络，但是这里的分布函数是从样本空间到语义空间的。 Flow-based models是真正开始学习分布的网络结构 overall forward process (diffusion process 扩散过程)：从右到左 X0→XTX_0 \rightarrow X_TX0→XT reverse process (denoising process 去噪过程)：从左到右 XT→X0X_{T}\rightarrow X_0XT→X0 扩散过程和去噪过程，都视为Markov 过程。 x0∼q(x0)x_0 \sim q(x_0)x0∼q(x0) 任务为：学习一个分布(distribution )...

Attention is all you need

发表于2024-11-06|paperReadingtransformer|transformer

model architecture Inputs: A paragraph of English consists of BBB (i.e. batch_size) sentences. Each sentence has NNN (i.e. seq_length) words at most. Outputs: A paragraph of Chinese translated from Inputs.(B,N)(B,N)(B,N) Encoder outcome: the feature matrix containing position , context, semantic information Decoder: auto-regressive , consuming the previously generated symbols as additional input when generating the next. For example: Inputs : I love u. (B=1B=1B=1) Learning feature from...

[CVPR2024]Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow

发表于2024-11-04|paperReadingmultimodal|multimodal•lidar•camera

Motivation Scene flow aims to model the correspondence between adjacent visual RGB or LiDAR features to estimate 3D motion features. RGB and LiDAR has intrinsic heterogeneous nature , fuse directly is inappropriate. We discover that the event has the homogeneous nature with RGB and LiDAR in both visual and motion spaces . visual space complementarity RGB camera: absolute value of luminance event camera: relative change of luminance LiDAR: global shape event camera: local boundary motion...

[CVPR2024]Point Transformer V3: Simpler, Faster, Stronger

发表于2024-10-27|paperReadingpoint cloud|transformer•point cloud

Motivation Scaling up is all you need. scale: size of datasets, the number of model parameters, the range of effective receptive field, and the computing power . scale principle: efficiency ( simplicity scalability ) VS accuracy Unlike the advancements made in 2d or NLP field,the previous works in 3D vision had to focus on improve the accuracy of the model due to the limited size and diversity of point cloud data available in separate domains . The time consumption of point transformer V1...

[CVPR2021]Point Cloud Transformer

发表于2024-10-20|paperReadingpoint cloud|transformer•point cloud

Motivation Point Cloud: disordered (permutation-invariant ) unstructured which make it difficult to designing a neural networks to process. All operations of Transformer are parallelizable and order-independent , which is suitable for PT feature learning. In NLP ,the classical Transformer use the positional encoding to deal with the order-independence . the input of word is in order, and word has basic semantic, whereas point clouds are unordered, and individual points have no semantic...