Welcome to the second post in our dedicated Natural Language Processing Blog Series. Natural Language Processing — otherwise known as NLP — is a fascinating area of the data space defined by IBM as a branch of AI that enables computers ‘to understand text and spoken words in much the same way human beings can.’
Studies in the space continue to reveal new uses of NLP. Here we chat to Jacob Webber, a University of Edinburgh Speech Technology PhD student to learn more about his NLP-related field of study as well as his experience of working and studying in this niche area of the data space. Read his thoughts below.
Jacob, please can you talk us through your current PhD thesis?
I am working on speech synthesis. We are trying to make speech synthesis systems more controllable, so you can choose exactly what the voice you are generating sounds like. Most systems at the moment learn from data, so they default to replicating voices that already exist and that we have good data for.
This work has lots of important applications, but it is particularly important that people with speech disabilities can use voices that are representative of them. Control can mean lots of things, from mood of speaker to voice quality (e.g. gravelly, smooth, etc).
Is there a relation to NLP?
There is a very strong relation to NLP. Speech technology research might be seen as somewhat separate from NLP, which generally applies to text processing, but speech is certainly language, and we process it. The division is becoming less clear with things like machine translation, which is a standard NLP task. Speech-to-speech machine translation, with no text based intermediate representations, is a very difficult task, but it’s one that people are working on. Some aspects of speech technology (like audio quality enhancement) have less of a relation to NLP, but the divisions are not well-defined and are highly permeable.
What, in your words, would you describe NLP as?
I think I would define the field very broadly as any computational effort to process language. Language might be speech, text, or singing. This ranges from very simple programs, such as those that search through a document or web page to find a particular word, to very sophisticated machine learning models that will translate between languages or transcribe recorded speech. It’s a very broad, fast-moving and exciting field.
Is there anything over the past year or so that has happened in your area of study that particularly stood out to you?
There is very fast progress being made in this area right now. There are a lot of companies, large and small, working on text-to-speech systems at the moment. Many of them, like universities, are publishing their ideas in academic conferences and journals, which is accelerating progress further.
If I had to pick one idea that has stood out it would probably be WaveNet. This was a paper from Google that showed it was possible to generate audio directly using machine-learning. This is from a few years ago, but recent progress has been improving the performance of this sort of system.
How did your academic journey so far lead you to this current niche?
My first degree was in Physics and Music, and between the two I got interested in audio processing and physical models of sound. It became clear how much of science is becoming dominated by computation, and specifically processing huge amounts of data.
I did the Master’s and learnt how to program GPUs and supercomputers. It was a great experience and at the end I got to apply it all to physics-based room acoustics. It was during this that I realised that speech processing was another application of these audio processing and GPU programming skills. It’s a fascinating area – I have a lot to learn about linguistics, but it’s endlessly interesting.
What piqued your interest in this specific part of the data space in particular?
I really like the range of expertise it requires, from computing to audio processing and to linguistics. This range makes it really interesting, but it also makes the field more diverse than others in computer science.
Before starting your PhD, you were a Software Engineer in Cambridge. What did you focus on during your time there?
I was working programming embedded wireless communication systems. It was programming in low level C, which was a great educational experience for me. I really felt like I was understanding how computers work, which is important for me. I like to know how everything is working.
What types of careers have other students on your courses gone onto pursue, from your Master’s Computing Course, and your current PhD study?
Many others from the MSc have gone on to do PhD studies too. A few have gone into industry. Many have ended up working in Cambridge, which seems to be a hotspot for British computing.
Those studying for PhDs in speech technology often end up at large companies that are working on this sort of technology. A well-known search engine provider and a well-known online book retailer have large research teams working on this stuff, to name a couple. Recently I have noticed more start-ups having the resources to research and develop their own speech technology, which is great.
What’s next for you, will you be entering the world of data or do you plan to pursue further academic studies post PhD?
I’m not sure! I hope to be in academia, but it’s very competitive. I would be very happy working as a researcher at a company too.