Abstract—Automatic bipolar disorder classification is a
challenging task. In this paper, we mainly focus on BD
classification from acoustic, visual, and textual modalities. We
highlight three aspects of our methods: 1) besides the baseline
features, we explore and fuse some hand-crafted and deep
learned features from all available modalities including acoustic,
visual, and textual modalities. It should be noted that we
extracted the textual modality by using the voice translation
tool according to the acoustic modality; 2) Considering the fact
that each video is given only one video-level label, while each
frame of the video is unlabeled, we use the unsupervised
Convolutional Auto-Encoder (CAE) and used it for feature
extraction. 3) Due to the dataset is too small to train
Convolutional Neural Network (CNN), so we decide to pre-train
the CNN on other emotion datasets. The experimental results
show that our model outperforms the baseline system. The final
unweighted average recall (UAR) we gained is 93.12%.
Index Terms—Bipolar disorder classification, CNN, CAE, multimodal features.
The authors are with the Computer Science Department, Beijing Normal University, Beijing, China (Corresponding author: Yongkang Xiao; e-mail: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com).
Cite: Bo Sun, Siming Cao, Penghao Rao, Jun He, Lejun Yu, and Yongkang Xiao, "Bipolar Disorder Classification Based on Multimodal Recordings," International Journal of Machine Learning and Computing vol. 11, no. 1, pp. 55-60, 2021.Copyright © 2021 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).