当前位置: 首页 > java >正文

[ACM MM 2024]Lite-Mind:Towards Efficient and Robust Brain Representation

论文网址:Lite-Mind: Towards Efficient and Robust Brain Representation Learning | Proceedings of the 32nd ACM International Conference on Multimedia

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Brain Visual Decoding

2.3.2. Fourier Transform in Deep Learning

2.4. Lite-Mind

2.4.1. Overview

2.4.2. DFT Backbone

2.4.3. Retrieval Pipeline

2.5.  Experiments

2.5.1. Dataset

2.5.2. Implementation details

2.6. Results

2.6.1. fMRI/image retrieval

2.6.2. LAION-5B retrieval

2.6.3. GOD zero-shot classification

2.6.4. Ablations and visualization

2.7. Limitations

2.8. Conclusion

1. 心得

(1)~重建不行检索也是路~

2. 论文逐段精读

2.1. Abstract

        ①Limitations on fMRI decoding image retrieval: scarce data, low signal-to-noise ratio, individual variations

2.2. Introduction

        ①字挤的会议或者长篇期刊都很爱intro写一点相关工作

        ②The authors aim to design specific lightweigt model for each one:

2.3. Related Work

2.3.1. Brain Visual Decoding

        ①Lists Mindreader, BrainClip, Mind-Vis, and MindEye, pointing out that they did not consider the lightweight network

2.3.2. Fourier Transform in Deep Learning

        ①Introduce how Fourier Transform is used in digital signal process field

2.4. Lite-Mind

2.4.1. Overview

        ①The overview of Lite-Mind:

where (a) is the backbone of MindEye, (b) represents Lite-Mind

2.4.2. DFT Backbone

        ①fMRI-image pair: (x,y)

        ②Dataset: D

(1)fMRI Spectrum Compression

        ①Divide image x (voxel level) to n non-overlapping patches x=\left [ x_1,x_2,...,x_n \right ] with 0 padding

        ②Employ positional enoding on patches then get t=\left [ t_1,t_2,...,t_n \right ]. The spectrum of these tokens are processed by 1D Discrete Fourier Transform (DFT):

X[k]=F(t)=\sum_{i=1}^{n}t_{i}e^{-ki(2\pi/n)j}

where X\in\mathbb{C}^{n\times d} denotes complex tensor, 2\pi k/n denotes frequency, i is the code of token and j is the code of imaginary unit

        ③For M filters \mathbf{K}=[\mathbf{k}_{1},\mathbf{k}_{2},...,\mathbf{k}_{M}], the features can be extracted by:

\hat{X}=\sum_{m=1}^{M}\frac{1}{n}|X|^{2}\odot\mathbf{k}_{m}cos(\frac{(2m-1)\pi}{2M})

where \hat{X}\in\mathbb{C}^{n\times d}\odot is element-wise multiplication, |X|^{2} denotes power spectrum of X

        ④Convert the spectrum back into the spatial domain by Inverse Discrete Fourier Transform (IDFT):

\hat{t}\leftarrow F^{-1}(\hat{X})

(2)Frequency Projector

        ①Align voxel and image by FreMLP:

X^{\prime}=\sigma(\hat{X}^{T}\mathcal{W}+\mathcal{B})^{T}

where \mathcal{W}\in\mathbb{C}^{n\times n^{\prime}} denotes complex number weight matrix, \mathcal{B}\in\mathbb{C}^{n^{\prime}} is complex number bias, X^{\prime}\in\mathbb{C}^{n^{\prime}\times d} is the final output, \sigma denotes the activation function. It can be extend to:

\begin{aligned} X^{\prime} & =(\sigma(Re(\hat{X}^{T})\mathcal{W}_{r}-Im(\hat{X}^{T})\mathcal{W}_{i}+\mathcal{B}_{r}) \\ & +j\sigma(Re(\hat{X}^{T})\mathcal{W}_{i}+Im(\hat{X}^{T})\mathcal{W}_{r}+\mathcal{B}_{i}))^{T} \end{aligned}

where Re\left ( \cdot \right ) is the real part of \hat{X}^{T}\mathcal{W}=\mathcal{W}_{r}+j\mathcal{W}_{i}\mathcal{B}=\mathcal{B}_{r}+j\mathcal{B}_{i}

        ②Employ IDFT again:

t^{\prime}\leftarrow F^{-1}(X^{\prime})

and f is the voxel embedding

2.4.3. Retrieval Pipeline

        ①Optimization objective:

\omega^{*}=argmax\sum_{\omega}\sum_{(x,y)\in D}SIM(DFT(x;\omega),CLIP(y))

where \omega is the weight of DFT backbone, SIM\left ( \cdot \right ) denotes cosine similarity

        ②They process f by LAION-5B:

\mathcal{V}^{\prime}=Diffusion(f)

        ③Contrastive loss:

L_{contr}=-\frac{1}{|B|}\sum_{s=1}^{|B|}\log\frac{\exp(f_{s}^{\top}\cdot V_{s}/\tau)}{\sum_{i=1}^{|B|}\exp(f_{s}^{\top}\cdot V_{i}/\tau)}

where B denotes batch size, \tau is temperature factor

        ④MSE loss to constrain the image generation:

L_{mse}=\frac{1}{|B|}\sum_{s=1}^{|B|}\|V_{s}-V^{\prime}{}_{s}\|_{2}^{2}

        ⑤Final loss:

L=L_{contr}+\alpha L_{mse}

        ⑥Tasks: test set retrieval, LAION-5B retrieval, zero-shot classification

2.5.  Experiments

2.5.1. Dataset

        ①Dataset: Natural Scenes Dataset (NSD)

        ②Sample: subject 1, 2, 5, 7 with 10000 images 

        ③Data split: 8859 image stimuli and 24980 trials for training, 982 image stimuli and 2770 trials for test

        ④Voxel of each subject: 15724, 14278, 13039, and 12682

2.5.2. Implementation details

        ①V100 32GB GPU

2.6. Results

2.6.1. fMRI/image retrieval

        ①Retrieval performance:

2.6.2. LAION-5B retrieval

        ①Retrieval performance on LAION-5B:

        ②Retrieval results on LAION-5B:

2.6.3. GOD zero-shot classification

        ①Performance:

2.6.4. Ablations and visualization

        ①Ablation of different depth of DFT backbone:

        ②Module ablation:

        ③Retrieval performance with different cerebral cortex for Subject 1 on the NSD dataset:

        ④t-SNE for embedding visualization:

2.7. Limitations

        ①Number of training data

2.8. Conclusion

        ~

http://www.xdnf.cn/news/12052.html

相关文章:

  • MySQL对数据库用户的操作
  • VS Code开发项目,配置ESlint自动修复脚本
  • 高防CDN有用吗?它的防护效果怎么样?
  • 1. 数据库基础
  • 卫星的“太空陀螺”:反作用轮如何精准控制姿态?
  • 蓝桥云课ROS一键配置teb教程更新-250604
  • 嵌入式就业难不难?
  • 【趣味Html】第11课:动态闪烁发光粒子五角星
  • 力扣刷题Day 70:在排序数组中查找元素的第一个和最后一个位置(34)
  • Visual Studio 2022 在 Windows 11 添加资源时崩溃问题分析与解决方案
  • [Linux] Linux GPIO应用编程深度解析与实践指南(代码示例)
  • JAVA实战开源项目:医院药品管理系统 (Vue+SpringBoot) 附源码
  • 数组1 day7
  • zabbix 6 监控 docker 容器
  • Linux 库文件的查看和管理
  • 解决 Java 项目中 “zip END header not found“ 错误
  • 【el-progress】element UI 进度条组件
  • 易基因:贵州省医刘代顺团队MeRIP-seq揭示m6A修饰在病毒感染中的免疫调控作用 | 项目文章
  • AI Agent 能否理解人类的行为和决策?
  • Java - 数组
  • 【docker】Windows安装docker
  • [Java 基础]抽象类和接口
  • 马尔可夫链(Markov Chain)和马尔可夫决策过程(Markov Decision Process, MDP)
  • ST语言控制电机往返运动
  • Flink进阶之路:解锁大数据处理新境界
  • 背景扩充:糖苷键的类型与表示方法 +python实现对糖分子名称的读取
  • JAVA容器
  • 从零开始:用Tkinter打造你的第一个Python桌面应用
  • 自驾总结_Routing
  • linux——账号和权限的管理