Most of the on-load tap changer (OLTC) mechanical fault diagnosis models based on vibration signal lack applicability because the short-time high-amplitude data of each sample needto be artificially selected, the feature extraction methods are designed according to subjective experience and the information of the whole signal is not used. To solve these problems, a mechanical fault diagnosis model of OLTC based on same-source heterogeneous data fusion is proposed. Firstly, two detection algorithms are proposed to detect short-time high-amplitude data of each sample and transform the data into time-acceleration (TA) images. Secondly, an improved convolution neural network (CNN) is trained with the images, the features are extracted from the last pooling layer of the network. Afterwards, four auxiliary features are proposed according to the characteristics of the whole vibration signal. Finally, the image features and the auxiliary features are fused to form feature fusion data, and the data is used to train a support vector machine (SVM) to diagnose fault.Experiments conducted on single channel signal verify that the proposed model performs the best among different CNN and、 models, while the auxiliary features can also fuse with the features of other CNN or models to improve their accuracies.