The AB-Amy 2.0 builds upon its predecessor, AB-Amy, by investigating the efficacy of different language models in predicting the risk of light amyloidosis. Importantly, it is noteworthy that AB-Amy 2.0 exhibits superior performance compared to its previous version, signifying a significant advancement in this field.
In this new study, the training and test dataset consists of 1400 sequences (700 positive vs 700 negative) and 532 sequences (337 positive vs 195 negative) respectively. We utilize the pretrained protein language model to acquire light chain embeddings. Each sequence was encoded into feature vectors of different dimensions using four different language models: AbLang (768 features), antiBERTy (512 features), ESM-2 (1280 features), and ProtT5 (1024 features). Finally, these results showed that the SVM model trained on antiBERTy embedding showed the best performance in five-fold cross-validation with the mean AUC of 0.9848.