当前位置: 首页 > ds >正文

【LCMM】纵向轨迹模型,组轨迹模型

latent_class_mixed_models

基础知识

增长混合模型(GMM)和潜在类别增长模型(LCGA)的核心区别确实主要在于是否允许类别内存在随机效应但两者的差异还涉及模型灵活性、假设和应用场景等方面。以下是详细对比:

  1. 随机效应的处理
    GMM(增长混合模型)

允许类别内随机效应:每个潜在类别中,个体的增长参数(如截距、斜率)可以围绕类别均值存在随机变异。例如,同一类别的个体可能有不同的截距和斜率,服从某种分布(如正态分布)。

更灵活:适用于数据中存在“类别内异质性”的情况,即同一类别内个体的轨迹不完全一致。

LCGA(潜在类别增长分析)

无随机效应:每个潜在类别内的所有个体具有完全相同的增长参数(固定效应),即类别内的轨迹是严格同质的(无个体变异)。

更严格:假设同一类别的个体轨迹完全相同,可能忽略实际存在的个体差异。

  1. 模型复杂度与假设
    GMM

参数更多:需要估计随机效应的方差-协方差矩阵,模型更复杂。

样本量需求更大:由于参数更多,通常需要更大的样本量以保证估计稳定性。

适用性:适合探索“类别内异质性”的复杂场景,例如研究不同亚群体的发展轨迹时,同时考虑群体间和群体内的差异。

LCGA

参数更少:仅需估计类别均值和残差方差,模型更简单。

收敛更容易:计算复杂度低,适合小样本或初步探索性分析。

适用性:适用于假设类别内个体高度同质的情况,例如明确划分的干预组或自然分组。

  1. 实际应用中的选择
    模型选择依据:

通过信息准则(如BIC、AIC)或似然比检验(需注意模型嵌套关系)比较。

若GMM中随机效应的方差接近零,则可简化为LCGA。

权衡:

GMM灵活性高,但可能过拟合;LCGA更简洁,但可能忽略重要变异。

实践建议:先尝试LCGA,若模型拟合不佳(如残差变异大),再考虑GMM。

  1. 其他差异
    协变量纳入:两者均可纳入协变量预测类别成员资格(通过多项Logit模型),但GMM还可分析协变量对类别内随机效应的影响。

轨迹形状:LCGA通常假设线性或多项式增长,而GMM可结合更复杂的非线性随机效应。

示例1-GROW

TH MIXTURE MODEL

This repository holds code associated with the publication titled: “Identifying typical trajectories in longitudinal data: modelling strategies and interpretations” (DOI: https://doi.org/10.1007/s10654-020-00615-6 ).

# Load the necessary packages for latent class mixed models and spline functions
library(lcmm)
library(splines)# input dataset df: 
# data frame with columns pseudonymised patient episode identifier,
# time from blood culture collection to clinical parameter measurements,
# and C-reactive protein  values (Box-Cox transformed).# Define the natural spline basis for 'time'
# using the 25th, 50th, and 75th percentiles as knots and
# 1st and 99th percentiles as boundary knots
timeSplineKnots <- quantile(df$time, c(0.25, 0.5, 0.75))
timeSplineBoundary <- quantile(df$time, c(0.01, 0.99))
timeSpline <- ns(df$time, knots = timeSplineKnots, Boundary.knots = timeSplineBoundary)# Renaming columns for interpretability
timeSplineColNames <- paste0("T", 1:ncol(timeSpline))
colnames(timeSpline) <- timeSplineColNames# Binding the natural spline terms to the original dataframe
df <- cbind(df, timeSpline)# Fitting a series of latent class mixed models with varying number of classes (ng)
# Beginning with a standard linear mixed model (LCMM_CRP_1)
LCMM_CRP_1 <- lcmm(CRP_trans ~ T1 + T2 + T3 + T4,random = ~ T1 + T2 + T3 + T4,subject = "EpisodeID",nproc = 20, # Number of cores to use for parallel processingdata = df
)m2d <- gridsearch(hlme(normMMSE ~ age65+I(age65^2)+CEP,random =~ age65+I(age65^2), # 随机效应部分subject = 'ID',data=paquid, ng = 2, mixture=~age65+I(age65^2)),rep=100, maxiter=30, minit=m1)# Fitting latent class mixed models with multiple classes
# using LCMM_CRP_1 as the starting values (B = LCMM_CRP_1)
LCMM_CRP_2 <- lcmm(CRP_trans ~ T1 + T2 + T3 + T4,mixture = ~ T1 + T2 + T3 + T4,random = ~ T1 + T2 + T3 + T4, maxiter = 10000,subject = "EpisodeID", ng = 2, B = LCMM_CRP_1,nproc = 20, # Number of cores to use for parallel processingdata = df
)# The process continues, increasing ng for each model to test for better fitting class numbers
# The `update` function is used to change the number of classes while keeping the rest of the model the same.
LCMM_CRP_3 <- update(LCMM_CRP_2, ng = 3)
LCMM_CRP_4 <- update(LCMM_CRP_3, ng = 4)
LCMM_CRP_5 <- update(LCMM_CRP_4, ng = 5)
LCMM_CRP_6 <- update(LCMM_CRP_5, ng = 6)postprob(d4)conv<-ifelse(d4$conv==1, "TRUE", "FALSE")k<-paste(d4$AIC, d4$BIC, exp(d4$loglik), conv)k<-gsub(" ", ",", k)pp<-d4$pprobpp<-cbind(pp[,3:(ncol(pp))], d4$pprob$class)postprob(mod4)

示例2

## Package lcmm 
##  
## Cecile Proust-Lima, Viviane Philipps, Amadou Diakite & Benoit Liquet: lcmm: Extended Mixed Models Using Latent Classes and Latent Processes. 
## R package version 1.81. https://CRAN.R-project.org/package=lcmm 
# Note: Data needs to be in long format  library(lcmm)#load data
data <- read.table("Data.csv",sep=",", header= TRUE, na.strings = NA)#recenter and scale age
data$age12 <- (data$age - 12)/10### Estimation of mixed-effect models and latent class mixed-effect models for continuous Gaussian outcomes ### #Estimate the model with only one class (ng = 1)
m1 <- hlme(bmi ~ age12+I(age12^2),random =~ age12+I(age12^2),subject = 'ID', data = data) #estimate the model with two classes (ng = 2): 
m2 <- hlme(bmi ~ age12+I(age12^2),random =~ age12+I(age12^2), mixture=~age12+I(age12^2), subject = 'ID', data = data, ng = 2, B=m1) # ng=2
postprob(m2)# Estimate the model with two classes using gridsearch:
m2_gridsearch <- gridsearch(hlme(bmi ~ age12+I(age12^2),random =~ age12+I(age12^2), subject = 'ID', data = data, ng = 2, mixture=~age12+I(age12^2)),rep=100, maxiter=50, minit=m1) #ng=2
postprob(m2_gridsearch)#get summary of the models
summarytable(m1, m2, m2_gridsearch, which = c("G", "loglik", "conv", "npm", "AIC", "BIC", "SABIC", "entropy", "%class"))### Estimation of mixed-effect models and latent class mixed-effect models 
###   for continuous non-Gaussian or ordinal outcomes ### 
m1_lcmm <- lcmm(fussy_eating ~ age12+I(age12^2),random =~ age12+I(age12^2), subject = 'ID', data = data, link='thresholds') 
summary(m1_lcmm) ### Estimation of mixed-effect models and latent class mixed-effect models 
###   for multiple longitudinal markers measuring the same underlying latent process ### 
m1_mult_lin <- multlcmm(BMI + fussy_eating ~ age12+I(age12^2) + time + I(time^2/10), random =~ time +I(time^2 / 10), subject= 'ID', data = data, randomY = TRUE, cor = BM(time), link = c('linear','thresholds'))
summary(m1_mult_lin)### Estimation of mixed-effect models and latent class mixed-effect models 
###  for longitudinal markers and time to event (possibly with competing risks) ### 
m1_jointlcmm <- Jointlcmm(BMI~ age12+I(age12^2),random=~age12+I(age12^2),subject='ID', data=data, ng=1, survival = Surv(Tevent,Event)~ time+event, hazard="3-quant-splines", hazardtype="PH") #ng=1
summary(m1_jointlcmm)

示例3

############################################################################################################
############# Growth modeling for ECHO data
############################################################################################################library(tidyverse)
library(config)
library(here)
library(lcmm)
Sys.setenv(R_CONFIG_ACTIVE = "mike") # 'default')#
config <- config::get()# gm_df <- read.csv(here(config$ECHO_rolling_window_8_LIWC_path,config$ECHO_rolling_window_8_LIWC_name))############# rLSM.D modelsgm_df <- ECHO_smoothed_rLSM %>% # This df comes from runing the rLSM script trhough line 124 (stops before aggregating to file level)mutate(rLSM = rowMeans(select(.,contains('.rLSM')),na.rm = TRUE)) %>%drop_na(text_agg) %>%filter(Speaker == 'D') %>%select(File, rLSM) %>%drop_na() %>%# mutate(#   rLSM = scale(rLSM,center = TRUE)# ) %>%group_by(File) %>%mutate(id = row_number(),# t = case_when(id < (max(id)/5) ~ 0,#           id < (max(id/5))*2 ~ 1,#           id < (max(id/5))*3 ~ 2,#           id < (max(id/5))*4 ~ 3,#           TRUE ~ 4)t = case_when(id < (max(id)/10) ~ 0,id < (max(id/10))*2 ~ 1,id < (max(id/10))*3 ~ 2,id < (max(id/10))*4 ~ 3,id < (max(id/10))*5 ~ 4,id < (max(id/10))*6 ~ 5,id < (max(id/10))*7 ~ 6,id < (max(id/10))*8 ~ 7,id < (max(id/10))*9 ~ 8,TRUE ~ 9)) %>%ungroup() %>%select(-id) %>%group_by(File,t) %>%summarize(rLSM = mean(rLSM)) %>%ungroup() %>% mutate(group_id = group_indices(., File))m1.rLSM.D.v2 <- lcmm::hlme(rLSM ~ t,# random = ~ t,subject = 'group_id',data = gm_df
)m2.rLSM.D.v2 <- lcmm::hlme(rLSM ~ t,# random = ~ t,mixture = ~ t,ng = 2,subject = 'group_id',data = gm_df,B = random(m1.rLSM.D.v2))# m2g.rLSM.D.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.D.v2, 
#   hlme(rLSM ~ t,
#     # random = ~ 1 + t,
#     mixture = ~ t,
#     ng = 2,
#     subject = 'group_id',
#     data = gm_df))m3.rLSM.D.v2 <- lcmm::hlme(rLSM ~ t,# random = ~ 1 + t,mixture = ~ t,ng = 3,subject = 'group_id',data = gm_df,B = random(m1.rLSM.D.v2)
)# m3g.rLSM.D.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.D.v2, 
#   hlme(rLSM ~ t,
#        random = ~ 1 + t,
#        mixture = ~ t,
#        ng = 3,
#        subject = 'group_id',
#        data = gm_df))m4.rLSM.D.v2 <- lcmm::hlme(rLSM ~ t,# random = ~ 1 + t,mixture = ~ t,ng = 4,subject = 'group_id',data = gm_df,B = random(m1.rLSM.D.v2)
)# m4g.rLSM.D.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.D.v2, 
#   hlme(rLSM ~ t,
#        random = ~ 1 + t,
#        mixture = ~ t,
#        ng = 4,
#        subject = 'group_id',
#        data = gm_df))m5.rLSM.D.v2 <- lcmm::hlme(rLSM ~ t,# random = ~ 1 + t,mixture = ~ t,ng = 5,subject = 'group_id',data = gm_df,B = random(m1.rLSM.D.v2)
)
# 
# m5g.rLSM.D.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.D.v2, 
#   hlme(rLSM ~ t,
#        random = ~ 1 + t,
#        mixture = ~ 1 + t,
#        ng = 5,
#        subject = 'group_id',
#        data = gm_df))summarytable(m1.rLSM.D.v2,m2.rLSM.D.v2,m3.rLSM.D.v2, m4.rLSM.D.v2,m5.rLSM.D.v2,# m4, m4g,#m5,m5g,which = c("G", "loglik", "conv", "npm", "AIC", "BIC", "SABIC", "entropy","ICL", "%class"))
# summaryplot(m1.rLSM.D,m2.rLSM.D,m2g.rLSM.D,m3.rLSM.D,m3g.rLSM.D,
#             # m4, m4g,#m5,m5g,
#             which = c("BIC", "entropy","ICL"))summary(m2.rLSM.D.v2)data_pred0 <- data.frame(t = seq(0,9,length.out = 50), CEP = 0)
data_pred1 <- data.frame(t = seq(0,9,length.out = 50), CEP = 1)
pred0 <- predictY(m2.rLSM.D, data_pred0, var.time = "t")
pred1 <- predictY(m2.rLSM.D, data_pred1, var.time = "t")plot(pred0, col=c("red","navy"), lty=1,lwd=5,ylab="normMMSE",legend=NULL,  main="Predicted trajectories for normMMSE ",ylim=c(.6,.8))
plot(pred1, col=c("red","navy"), lty=2,lwd=3,legend=NULL,add=TRUE)
legend(x="topright",legend=c("class1 :","CEP-","CEP+","class2:","CEP-","CEP+"), col=c(rep("red",3),rep("navy",3)), lwd=2, lty=c(0,1,2,0,1,2), ncol=2, bty="n", cex = 0.7)rLSM.D.2Class <- m2.rLSM.D.v2$pprob %>%full_join(gm_df, by = 'group_id') %>%group_by(class,t) %>%mutate(t = t+1) %>%summarize(sd = sd(rLSM,na.rm = TRUE),rLSM = mean(rLSM)) %>%ungroup() %>%mutate(class = as.factor(class)) %>%ggplot(aes(x = as.factor(t), y = rLSM, color = class, group = class)) + geom_point() + geom_line() + ggthemes::theme_tufte() +# geom_errorbar(aes(ymin=rLSM-sd, ymax=rLSM+sd), width=.2,#               position=position_dodge(.9)#               ) +scale_fill_brewer(palette = "Set2") +labs(title = 'Mean rw.rLSM.D over time within encounters by growth class',x = 'Decile of turns within encounter',y = 'Mean rw.LSM.D')############# rLSM.P modelsgm_df_p <- ECHO_smoothed_rLSM %>% # This df comes from runing the rLSM script trhough line 124 (stops before aggregating to file level)mutate(rLSM = rowMeans(select(.,contains('.rLSM')),na.rm = TRUE)) %>%drop_na(text_agg) %>%filter(Speaker == 'P') %>%select(File, rLSM) %>%drop_na() %>%group_by(File) %>%mutate(id = row_number(),# t = case_when(id < (max(id)/5) ~ 0,#           id < (max(id/5))*2 ~ 1,#           id < (max(id/5))*3 ~ 2,#           id < (max(id/5))*4 ~ 3,#           TRUE ~ 4)t = case_when(id < (max(id)/10) ~ 0,id < (max(id/10))*2 ~ 1,id < (max(id/10))*3 ~ 2,id < (max(id/10))*4 ~ 3,id < (max(id/10))*5 ~ 4,id < (max(id/10))*6 ~ 5,id < (max(id/10))*7 ~ 6,id < (max(id/10))*8 ~ 7,id < (max(id/10))*9 ~ 8,TRUE ~ 9)) %>%ungroup() %>%select(-id) %>%group_by(File,t) %>%summarize(rLSM = mean(rLSM)) %>%ungroup() %>% mutate(group_id = group_indices(., File))m1.rLSM.P.v2 <- lcmm::hlme(rLSM ~ t,random = ~ 1 + t,subject = 'group_id',data = gm_df_p
)m2.rLSM.P.v2 <- lcmm::hlme(rLSM ~ t,random = ~ 1 + t,mixture = ~ t,ng = 2,subject = 'group_id',data = gm_df_p,B = random(m1.rLSM.P.v2)
)# m2g.rLSM.P.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.P.v2, 
#   hlme(rLSM ~ t,
#        random = ~ 1 + t,
#        mixture = ~ t,
#        ng = 2,
#        subject = 'group_id',
#        data = gm_df_p))m3.rLSM.P.v2 <- lcmm::hlme(rLSM ~ t,random = ~ 1 + t,mixture = ~ t,ng = 3,subject = 'group_id',data = gm_df_p,B = random(m1.rLSM.P.v2)
)# m3g.rLSM.P.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.P.v2, 
#   hlme(rLSM ~ t,
#        random = ~ 1 + t,
#        mixture = ~ t,
#        ng = 3,
#        subject = 'group_id',
#        data = gm_df_p))m4.rLSM.P.v2 <- lcmm::hlme(rLSM ~ t,random = ~ 1 + t,mixture = ~ t,ng = 4,subject = 'group_id',data = gm_df_p,B = random(m1.rLSM.P.v2)
)# m4g.rLSM.P.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.P.v2, 
#   hlme(rLSM ~ t,
#        random = ~ 1 + t,
#        mixture = ~ t,
#        ng = 4,
#        subject = 'group_id',
#        data = gm_df_p))m5.rLSM.P.v2 <- lcmm::hlme(rLSM ~ t,random = ~ 1 + t,mixture = ~ t,ng = 5,subject = 'group_id',data = gm_df_p,B = random(m1.rLSM.P.v2)
)# m5g.rLSM.P.v2 <- lcmm::gridsearch(
#   rep = 100, maxiter = 30, minit = m1.rLSM.P.v2, 
#   hlme(rLSM ~ t,
#        # random = ~ 1 + t,
#        mixture = ~ 1 + t,
#        ng = 5,
#        subject = 'group_id',
#        data = gm_df))summarytable(m1.rLSM.P.v2,m2.rLSM.P.v2,m3.rLSM.P.v2,m4.rLSM.P.v2,m5.rLSM.P.v2,which = c("G", "loglik", "conv", "npm", "AIC", "BIC", "SABIC", "entropy","ICL", "%class"))
# summaryplot(m1.rLSM.P,m2.rLSM.P,m2g.rLSM.P,m3.rLSM.P,m3g.rLSM.P,
#             # m4, m4g,#m5,m5g,
#             which = c("BIC", "entropy","ICL"))summary(m2.rLSM.P)data_pred0 <- data.frame(t = seq(0,9,length.out = 50), CEP = 0)
data_pred1 <- data.frame(t = seq(0,9,length.out = 50), CEP = 1)
pred0 <- predictY(m2.rLSM.P, data_pred0, var.time = "t")
pred1 <- predictY(m2.rLSM.P, data_pred1, var.time = "t")plot(pred0, col=c("red","navy"), lty=1,lwd=5,ylab="normMMSE",legend=NULL,  main="Predicted trajectories for normMMSE ",ylim=c(0,1))
plot(pred1, col=c("red","navy"), lty=2,lwd=3,legend=NULL,add=TRUE)
legend(x="topright",legend=c("class1 :","CEP-","CEP+","class2:","CEP-","CEP+"), col=c(rep("red",3),rep("navy",3)), lwd=2, lty=c(0,1,2,0,1,2), ncol=2, bty="n", cex = 0.7)rLSM.P.2Class <- m2.rLSM.P.v2$pprob %>%full_join(gm_df, by = 'group_id') %>%mutate(t = t+1) %>%group_by(class,t) %>%summarize(sd = sd(rLSM,na.rm = TRUE),rLSM = mean(rLSM)) %>%# ungroup() %>%mutate(class = as.factor(class)) %>%ggplot(aes(x = as.factor(t), y = rLSM, color = class, group = class)) + geom_point() + geom_line() + ggthemes::theme_tufte() +scale_fill_brewer(palette = "Set2") +labs(title = 'Mean rw.rLSM.P over time within encounters by growth class',x = 'Decile of turns within encounter',y = 'Mean rw.LSM.P')############# classes to outcomesclass_df <- m2.rLSM.P$pprob %>%full_join(gm_df, by = 'group_id') %>%rename(class.P = class) %>%select(group_id, class.P, File) %>%full_join(m2.rLSM.D$pprob, by = 'group_id') %>%rename(class.D = class) %>%# select(-c(prob1,prob2,group_id)) %>%distinct()
table(class_df$class.D,class_df$class.P)#open survey data
ECHO_survey_data <- haven::read_dta(here(config$ECHO_survey_data_path, config$ECHO_survey_data_name))#Open ID key file
ECHO_ID_key <- read_csv('ECHO_ID_key.csv')#merge GROWTH_MODELING_FILE and ECHO_ID_key by "File"
class_df <- left_join(class_df, ECHO_ID_key, by = "File" ) #merge GROWTH_MODELING_FILE and ECHO_survey_data by "tapeid" to include survey data
class_df <- left_join(class_df, ECHO_survey_data, by = "tapeid")# Transforming cdspeak from continuous to dichotomous variable "cdspeakhigh"
class_df <- class_df %>%rowwise() %>%mutate(cdspeakhigh = ifelse(cdspeak >= 3, 1, 0))#Making the provider id variable
class_df  <- class_df %>%mutate(provider_id = str_sub(File, 1, 4))#Making the site_name variable: only three sites in data set: JC, OC, SC
class_df <- class_df %>%mutate(site_name = str_sub(provider_id, 1, 2))#creating site_id variable by converting JC==0, OC==1, SC==2
class_df <- class_df %>%mutate(site_id = ifelse(site_name == "JC", 0,ifelse(site_name == "OC", 1,2)))#changing site_id to factor
class_df <- class_df %>%mutate(site_id = factor(site_id))class_df <- class_df %>%mutate(provider_id = factor(provider_id))table(class_df$class.D,class_df$cdspeakhigh)
chisq.test(class_df$class.D,class_df$cdspeakhigh, correct = FALSE)# gginference::ggchisqtest(
#   chisq.test(class_df$class.D,class_df$cdspeakhigh, correct = FALSE)
# )
table(class_df$class.P,class_df$cdspeakhigh)
chisq.test(class_df$class.P,class_df$cdspeakhigh, correct = FALSE)class_df <- class_df %>% mutate(x = paste0(class.D,'D_',class.P,'P'))
table(class_df$x,class_df$cdspeakhigh)
chisq.test(class_df$x,class_df$cdspeakhigh, correct = FALSE)library(gtsummary)
class_df %>%select(cdspeakhigh, x) %>%mutate(cdspeakhigh = as.factor(cdspeakhigh),x = as.factor(x)) %>%drop_na() %>%gtsummary::tbl_summary(by = x) %>%gtsummary::add_p(pvalue_fun = ~style_pvalue(.x, digits = 2))### Pulling pprob
pprob_df <- m2.rLSM.P$pprob %>%full_join(gm_df, by = 'group_id') %>%rename(class.P = class, P.prob1 = prob1, P.prob2 = prob2) %>%select(-t,-rLSM) %>%full_join(m2.rLSM.D$pprob, by = 'group_id') %>%rename(class.D = class, D.prob1 = prob1, D.prob2 = prob2) %>%select(-c(group_id)) %>%distinct()write_csv(pprob_df, 'ECHO_pprob.csv')

lcmm::gridsearch

第一段代码是单次拟合,计算速度较快,但可能陷入局部最优。

第二段代码通过网格搜索(rep = 100)和更宽松的迭代限制(maxiter = 30)提高找到全局最优解的可能性,但计算时间显著增加。

第一段代码可能是为了快速测试模型结构。

第二段代码更适用于正式分析,通过网格搜索确保参数估计的稳定性。

测试

以下是基于R语言的组轨迹模型(GBTM)或去掉随机效应变为LCMM,分析步骤,适用于纵向血压、体重数据,并纳入年龄、性别作为协变量,分析心血管疾病发病风险的示例代码:

# 1. 准备数据与加载包
# 加载所需包
library(lcmm)
library(tidyverse)
library(traj)# 模拟数据集(假设数据结构)
set.seed(123)
n <- 500  # 样本量
data <- data.frame(id = rep(1:n, each=4),time = rep(0:3, n),  # 0-3表示4个时间点age = rep(rnorm(n, 50, 5), each=4),sex = rep(sample(c("Male", "Female"), n, replace=TRUE), each=4),sbp = rnorm(n*4, 120 + 0.5*time, 10),  # 收缩压随时间轻微上升weight = rnorm(n*4, 70 - 0.3*time, 5), # 体重随时间轻微下降cvd = rep(rbinom(n, 1, 0.2), each=4)   # 结局:是否发生心血管疾病
)
# 2. 拟合血压的组轨迹模型
# 使用lcmm包拟合潜类别混合模型
# 模型设定:时间固定效应,随机截距+斜率,按类别分组
bp_model <- hlme(fixed = sbp ~ time + age + sex,   # 固定效应:时间、年龄、性别mixture = ~ time,                 # 类别间时间效应不同random = ~ time,                  # 随机效应:截距和斜率ng = 3,                           # 假设3个类别subject = "id",                   # 个体IDdata = data,B = random                        # 随机初始值
)# 查看模型结果
summary(bp_model)
plot(bp_model, which = "trajectory")  # 绘制轨迹图
# 3. 选择最佳类别数(K)
# 比较不同K值的BIC
bic_values <- sapply(2:4, function(k) {model <- hlme(sbp ~ time + age + sex, mixture = ~time, random = ~time, ng = k, data = data)return(model$BIC)
})
plot(2:4, bic_values, type="b", xlab="Number of Classes", ylab="BIC")
# 选择BIC最小的K值
# 4. 分配个体到轨迹组
# 提取类别概率并分配最可能类别
data$bp_class <- as.factor(bp_model$pprob$class)# 查看各类别占比
table(data$bp_class) / n
# 5. 分析轨迹组与心血管疾病的关系
# 逻辑回归:轨迹组预测CVD发病(需每条个体保留唯一结局)
data_unique <- data %>% group_by(id) %>% slice(1) %>% ungroup()
glm_model <- glm(cvd ~ bp_class + age + sex, family = binomial, data = data_unique
)
summary(glm_model)
# 6. 对体重进行类似分析
# 重复步骤2-5,将因变量替换为weight。

结果比较

后续分析

http://www.xdnf.cn/news/2407.html

相关文章:

  • c++学习小结
  • AUTOSAR图解==>AUTOSAR_SWS_StandardTypes
  • PotPlayer,强大的高清视频播放器
  • 使用 Spring Boot 进行开发
  • TypeScript基础数据类型详解
  • 多数元素(简单)
  • VSCode远程登录云服务器并设置免密登录全攻略
  • java每日精进 4.26【多租户之过滤器及请求处理流程】
  • llama factory怎么命令行推理图片
  • java—基础
  • A. Everybody Likes Good Arrays!
  • Java 程序运行和类路径处理
  • map和set的应用总结
  • MySQL 常用语句教程
  • Python数值类型修炼手册:从青铜到王者的进阶之路
  • Buffer Pool是什么,有什么作用
  • 【MATLAB第118期】基于MATLAB的双通道CNN多输入单输出分类预测方法
  • Android学习总结之协程对比优缺点(协程一)
  • 腾讯云智三道算法题
  • 侵水防触电的原理是什么? 侵水防触电算先进技术吗?-优雅草卓伊凡
  • 【Redis——通用命令】
  • 写时拷贝讲解
  • SQL:MySQL 函数
  • Eigen库入门
  • 博客文章格式更新2.0
  • N维漂洛界的定义和参数方程
  • 算法设计课作业
  • 【概念】什么是 JWT Token?
  • JAVA多线程(8.0)
  • matlab实现稀疏低秩去噪