Search for question
Question

TO do: it's related to our graduation project. We are using speech recognition algorithms in our system for error detection in the pronunciation of letters. what we are confused in is

the part after completing the acoustic model and data processing pipeline, extracting the results from the acoustic model, and converting our audio data pipeline to the required format for the CRNN model, do we need to retrain the same model from scratch to achieve more accurate and faster results? After this step, how do we integrate this model into our program to receive audio data from users? we only need to understand how we will achieve this part. We already have documented how does the DNNs will work. We are only confused in this part and to clarify it more for you we aren't working on the implementation phase yet. we still in the documentation phase which indicates describing and explaining how these algorithms will work

Fig: 1