SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
SKIVE PROJECTS
SPEECH EMOTION RECOGNITION WITH MULTISCALE AREA ATTENTION AND DATA AUGMENTATION
In Speech Emotion Recognition (SER), emotional characteristics often appear in diverse forms of energy patterns in spectrograms. Typical attention neural network classifiers of SER are usually optimized on a fixed attention granularity.
We designed an attention-based convolution
neural network with 5 convolution layers, an attention layer, and a fully
connected layer. The result is fed
into four consecutive convolution layers and generates an 80-channel
representation. Then the attention layer attends to the representation and
sends the outputs to the fully connected layer for classification. Batch
normalization is applied after each convolution layer.
CONTACT US -Click here
Comments
Post a Comment