A new era of immersive experiences is here, led in no small part by advances in computer vision technologies. Whether it’s a virtual call that blurs out your background, or your car successfully driving itself down a highway, advances in computer vision advances are transforming how we live our lives.
Qualcomm Technologies, Inc.’s Senior Director of Engineering in Multimedia Research and Development, Ananth Kandhadai, has been a leader in inventing a wide range of computer vision and artificial intelligence technologies for more than two decades. Since joining the company in 1996, Ananth’s research interests have included speech coding, image processing, deep learning, vision hardware acceleration, power-constrained system design, and AR/VR system solutions. Currently, he leads a group of R&D engineers focused on computer vision and camera systems.
Originally from India, Ananth obtained his bachelor’s degree in electrical engineering from the National Institute of Technology Calicut before moving to the United States to complete his master’s degree in electrical engineering at Virginia Tech. After that, he was left with a choice: continue his studies and pursue a Ph.D. or begin his career as an engineer with Qualcomm. He chose Qualcomm and began his research on mobile speech coding and standardization. Ananth believes he made the right decision, and says he learned more and grew more with Qualcomm than he would have if he had chosen academia.
And, indeed, his research is paying off. Some innovative inventions spearheaded by Ananth over the years include the ability for smartphone cameras to launch applications based on changes in a person’s environment, as well as an abundance of work in speech coding and image and signal processing. Without Ananth’s technological breakthroughs, we might not have some of the intelligent devices and rich multimedia experiences we have today.
We sat down with Ananth for a conversation on his distinguished career at the intersection of signal processing and computer vision with Qualcomm Technologies, Inc.
This interview has been lightly edited for clarity and length.
What are the main technologies you've worked with over your 25 years with Qualcomm?
More than half of my time at Qualcomm has been in the specific area of speech coding and standardization — basically, speech compression techniques for cell phones. Cellular and satellite communications is something I’ve been working on since 1996 before I migrated to camera processing and computer vision.
Fundamentally, I’ve been working on signal processing as the core area for a long time, applying it to speech coding, speech compression, and speech processing. Then I moved on to camera processing and computer vision. Now, I’m working on specific computer vision applications for XR as well as some other display and rendering aspects.
My movement has mirrored that of our company. The number of phone users using wireless systems was much less when I started, so I participated in increasing the capacity over time. When that became big, we slowly started saying, “Okay, let’s add some camera phones.” If you remember those early flip phones, they had little cameras. I worked in the early days when we started seeing the explosive growth of camera phones, prompting many different technology challenges.
After a few years, when those cameras needed to become smarter, I focused on using computer vision — the ability for computers to detect and react to objects in the real world. And as we move toward a new world of applied computer vision, I’ve moved on to working on technologies that can be used for XR applications going forward. It’s not a coincidence that my areas of technology focus have shifted in lockstep with those of Qualcomm.
For those that may not know a lot about computer vision, can you give us a layman’s explanation of what it is and why it's important?
Imagine you’re wearing your glasses — like corrective lenses, everybody understands corrective lenses. You do it because you want to have a sharper view of the world, right? At a simple level, you’re wearing something on your head to see and perceive the world better — something that, in some way, enhances or extends your reality.
Computer vision is a digitized way of perceiving, recording, and understanding the visual data that a camera or series of cameras can generate, finding patterns that our eyes pick out naturally for example, or even patterns of data that are too subtle for us to perceive. Basically, it’s creating computerized methods of interpreting visual data to do some function, whether it’s obstacle recognition for a self-driving car, or foreground-background distinction on a virtual call.
As a device gets smarter, it needs to be able to parse out these patterns automatically. The device needs to be able to perceive things about the user and their surroundings like a digital assistant. This type of automatic perception requires the device to have intelligent computer vision capabilities — almost like a third eye to deliver a seamless user experience. That’s why computer vision is fundamentally important to all these different applications.
What are the technical challenges that have arisen as you've worked with applied computer vision, and how have you worked to address them?
It’s one thing to say, “Oh, it’s just like having an extra pair of eyes.” The eyes themselves are easy to replicate — an eye is just a light-receiving device. The hard part is replicating the brain that processes it all behind the eyes. It’s a classic field of research in neuroscience and consciousness where even something like how we perceive reality is still not fully understood. But that’s exactly what we’re trying to replicate in machines. That’s the first challenge, to have the same level of reliability that a human mind has, without the benefit of millions of years of evolution.
More importantly, the biggest thing Qualcomm Technologies, Inc. is working on is when you put something on someone’s head – like an XR device, for example – it must be light, and it can’t get hot. The power consumption needs to be very, very low. That means you can’t just throw a ton of compute power and memory at the hard problem of computer vision and perception. The device has to be light enough and cool enough to comfortably fit on a person’s head, and that comes down to innovations in power efficiency.
Power consumption and computational complexity are always at odds with each other, but usability requires these two to be simultaneously optimized for these different consumer form factors.
How does computer vision differ among different applications like automotive driver assistance, drones, robotics, and XR?
Fundamentally, these different applications are similar in the basic task they pursue. For example, knowing where an XR device user's head is or the position of the camera is very similar to a car needing to understand its surroundings while autonomously driving, or drones following an object on autopilot. The camera and vision systems on all of these need to figure out what is physically around the device. Therefore, the use of AI techniques for object detection, 3D reconstruction, mapping, object recognition, head tracking, and eye gaze tracking are all conceptually very similar across all these different use cases. The drone case is a little different because the driver is physically removed from the machine itself. There is no human in the machine.
There are other aspects that make this problem fundamentally different for each case, however. That's why it's very difficult to come up with a one-size-fits-all solution. The analogy I use is that you could say humans, cheetahs, and leopards all have four limbs, a mouth, and are predators. They’re similar, but they’re designed with different optimizations for different environments and situations.
Automobiles have cameras that are rigidly mounted on to the car, with technologies focused on providing camera stability. Cars generally tend to only be on roads, but they're going at high speeds. And most importantly, the cost of errors in automotive computer vision is a lot more catastrophic in a car. That makes it difficult, but the tradeoff is that it’s much more predictable.
Contrast that with putting a similar set of cameras on a user's head. The user could be anywhere and constantly be moving their head in unpredictable patterns, making what is seen by the cameras much less predictable than on a car. In that sense, an XR headset and automotive computer vision system make slightly different assumptions. The basic techniques remain the same, but the way they’re engineered makes it a completely different problem to solve.
But in the end, although how we solve the problem can be completely different, there's a lot of synergy between these different fields when it comes to actual architectural changes in our chipsets. Oftentimes, we find that the decisions you make at architectural level for one use-case end up helping another.
How does Qualcomm support your efforts, and in what ways do they enable you to create these computer vision technologies?
I’m lucky to work in Qualcomm because we have an established business in connectivity, applications processors, and smartphone platforms. That puts us in a position where my team can just focus on the more technical aspects of computer vision, like perception and rendering, creating the vehicle to get those technological capabilities in the hands of our customers and end-users.
Qualcomm also has excellent relationships with key industry leaders working in computer vision-relevant fields, and that helps us align on the fundamental problems we’re all trying to solve. I can solve many problems, but many of those may not be real problems. It’s important to collaborate and standardize around real problems and having relationships with other companies helps us do that.
Overall, Qualcomm actively promotes any solution our teams come up with. They adopt it and find ways to map it into a business opportunity, which requires a roadmap of solutions and inevitably differentiates our products from other companies. Along the same vein, the business teams challenge us by giving us time to define problems; they talk to the customers, and they give us the time to think big and develop these dreams. Particularly for an application like XR, it's interesting because the scale of business is still not as big as something like smartphones. But Qualcomm has a vision for the long term and encourages us to pursue it. This allows us to focus directly on the technologies instead of exclusively trying to figure out how they will be commercialized.
Finally, since Qualcomm has so many teams working on various aspects of designing industry leading SoCs, it allows us to work with different teams in other parts of the company to share and leverage knowledge that is otherwise rare to find. There are processes in place that allow us to provide input to other teams and use their work for a different purpose in these different applications, and that collaboration is something they almost require us to do. It turns out to be very helpful.
What advice would you give to younger inventors — maybe still in school — who are looking to build a career in speech recognition or computer vision technologies?
My advice based on my own experience is that inventions are a side effect of solving real problems. Focus on just solving difficult problems, and trust that those difficult problems are the ones that are going to get you to innovative solutions. If you find a problem that others have not solved, I think you should spend time solving it. It's risky, and there might be a reason why somebody has not solved it, but it’s very often worth it. There's always the risk that you might swallow more than you can chew — it's possible, but I wouldn’t worry about it. Just keep tinkering and never underestimate the power of imagination and creativity in your mind. Just because the problem is not solved doesn't indicate anything — it might just be that the problem was waiting for you to take a look at it. Everything always needs a fresh look, especially for problems that fall under the domain of “not yet solved.”
Ultimately, what’s important is that a problem is solved correctly — not necessarily focusing on finding flashy or innovative solutions. In my experience, there's a very high chance that innovation stems from solving those difficult problems. And this also helps when we submit for a patent. The patent department looks for proof of innovation and impact, as well as the novelty and the likelihood of use. All of these metrics for assessing whether a patent is useful or not is based on the foundation of the problem you’re solving.