Abstract: Although there have been many breakthroughs in the use of convolutional neural networks (CNN) for image classification, facial expression recognition (FER) in real-life is still a challenge in this research area. This paper proposes a method to leverage state-of-the-art multi-deep CNN encoders with support vector machines (SVM) to classify facial expression. We conducted experiments to show that combining features from multi-deep CNN is better than using a single deep CNN model. As well as combining multiple CNN models, we show that using rules to remove noise images from the training dataset improves the performance of the FER system. The FER2013 dataset was used to evaluate the proposed approach, which achieved 73.78% accuracy.