Not impossible but a huge project that like I say hasn't been solved in 20 years that I know of and probably more, and in fact hasn't been completely solved by humans. By all means have a go and see how far you get.

The project you outline is simpler than the general problem of interpreting speech as you know what you should be hearing but there's still a lot to do. You can't have a fixed recording of a sentence and do a simple byte by byte comparison because you'll never get the same recording twice even from the same person (try it!). You'll still need to do phoneme analysis on the input sound and you'll also need to know a wide range of pronunciations for the same phonemes so that the program has some means of determining, for instance, that both "buk" and "boooook" are correct pronunciations of the same word (book).

One thing is certain: if you attempt this project you will learn a heck of a lot even if you fail, and on that basis I would encourage you to give it a go, although don't make any promises about delivery dates! I recently came across an inspiring quote that went along the lines that the problem today is not that people miss their targets, but that people set targets too low and hit them. Aim high and even if you miss you'll hit a lot higher than if you aimed low.