[ACM MM 2024]Lite-Mind:Towards Efficient and Robust Brain Representation
论文网址:Lite-Mind: Towards Efficient and Robust Brain Representation Learning | Proceedings of the 32nd ACM International Conference on Multimedia
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
1. 心得
2. 论文逐段精读
2.1. Abstract
2.2. Introduction
2.3. Related Work
2.3.1. Brain Visual Decoding
2.3.2. Fourier Transform in Deep Learning
2.4. Lite-Mind
2.4.1. Overview
2.4.2. DFT Backbone
2.4.3. Retrieval Pipeline
2.5. Experiments
2.5.1. Dataset
2.5.2. Implementation details
2.6. Results
2.6.1. fMRI/image retrieval
2.6.2. LAION-5B retrieval
2.6.3. GOD zero-shot classification
2.6.4. Ablations and visualization
2.7. Limitations
2.8. Conclusion
1. 心得
(1)~重建不行检索也是路~
2. 论文逐段精读
2.1. Abstract
①Limitations on fMRI decoding image retrieval: scarce data, low signal-to-noise ratio, individual variations
2.2. Introduction
①字挤的会议或者长篇期刊都很爱intro写一点相关工作
②The authors aim to design specific lightweigt model for each one:
2.3. Related Work
2.3.1. Brain Visual Decoding
①Lists Mindreader, BrainClip, Mind-Vis, and MindEye, pointing out that they did not consider the lightweight network
2.3.2. Fourier Transform in Deep Learning
①Introduce how Fourier Transform is used in digital signal process field
2.4. Lite-Mind
2.4.1. Overview
①The overview of Lite-Mind:
where (a) is the backbone of MindEye, (b) represents Lite-Mind
2.4.2. DFT Backbone
①fMRI-image pair:
②Dataset:
(1)fMRI Spectrum Compression
①Divide image (voxel level) to
non-overlapping patches
with 0 padding
②Employ positional enoding on patches then get . The spectrum of these tokens are processed by 1D Discrete Fourier Transform (DFT):
where denotes complex tensor,
denotes frequency,
is the code of token and
is the code of imaginary unit
③For filters
, the features can be extracted by:
where ,
is element-wise multiplication,
denotes power spectrum of
④Convert the spectrum back into the spatial domain by Inverse Discrete Fourier Transform (IDFT):
(2)Frequency Projector
①Align voxel and image by FreMLP:
where denotes complex number weight matrix,
is complex number bias,
is the final output,
denotes the activation function. It can be extend to:
where is the real part of
,
,
②Employ IDFT again:
and is the voxel embedding
2.4.3. Retrieval Pipeline
①Optimization objective:
where is the weight of DFT backbone,
denotes cosine similarity
②They process by LAION-5B:
③Contrastive loss:
where denotes batch size,
is temperature factor
④MSE loss to constrain the image generation:
⑤Final loss:
⑥Tasks: test set retrieval, LAION-5B retrieval, zero-shot classification
2.5. Experiments
2.5.1. Dataset
①Dataset: Natural Scenes Dataset (NSD)
②Sample: subject 1, 2, 5, 7 with 10000 images
③Data split: 8859 image stimuli and 24980 trials for training, 982 image stimuli and 2770 trials for test
④Voxel of each subject: 15724, 14278, 13039, and 12682
2.5.2. Implementation details
①V100 32GB GPU
2.6. Results
2.6.1. fMRI/image retrieval
①Retrieval performance:
2.6.2. LAION-5B retrieval
①Retrieval performance on LAION-5B:
②Retrieval results on LAION-5B:
2.6.3. GOD zero-shot classification
①Performance:
2.6.4. Ablations and visualization
①Ablation of different depth of DFT backbone:
②Module ablation:
③Retrieval performance with different cerebral cortex for Subject 1 on the NSD dataset:
④t-SNE for embedding visualization:
2.7. Limitations
①Number of training data
2.8. Conclusion
~