Facetracking has always held a magical place for me. Jaded as I may have become, the instant gratification of processing video in realtime for facetracking has never gotten old for me. Doing something as seemingly simple as keeping a red box drawn around my face at 30fps (or better) is quite a feat, which I equate to seeing mathematics in action, and a reminder that is it wasn’t too long ago that it took a lot of processing power (read as lots of time) just to work on a still frame. There are different methods to achieving the same goal, some engines are expensive, some have a more reasonable license, and I happen to like one that is completely free – Intel’s gift to the world, OpenCV (the CV stands for computer vision) released (officially) in 1999. It is really easy to love a tech company, one of the giants, who choose to open source such an important toolkit to the dev community, as opposed to the patent thugs out there (boo, Apple bought the company that opensourced openNI and announced they will close the project. I’ve never been more willing to throw dirt on their pretty design). See below for my rant on opensourcing.
Rewind, 2012, Promax in Los Angeles. My partner at Supertouch was invited to speak, he decided we’d make the presentation a bit more spectacular than most of the Promax audience had probably seen throughout the event. Sam was in usual form, entertaining, clever, and educating through a perspective that is unique to his black rimmed glasses. Hanging in the background was me at a podium with my laptop and a webcam. Sam talked through several technologies we’d been playing with in the lab as a way to show cutting edge tools that would soon make their way into marketing experiences. Promax is a media crowd, and we’re in LA, so we decided to pay a fun homage to Hollywood’s history. On the big projection screen behind me was the output of of laptop, a fullscreen of my face in view of my webcam. I connected different expressions to classic sound bytes from film and television. Raised eyebrows played the beginning of the Twilight Zone theme, a surprised expression played a memorable scream from Psycho, an open mouth triggered Arnold Schwarzenegger saying “I’ll be back” as the Terminator, and so on. I had about 40 recognizable sound bytes and got laughs for my exaggerated facial expressions at 12′ tall on the big screen. It went over great and I found great fun in using Kyle McDonald‘s FaceOSC (based on Jason Saragih‘s original work) and vvvv as a performance tool this way.
Another project, related to the test video above, was for a pharmaceutical company to be used for a trade show installation. The goal was simple, use facetracking to attach short messages to the face, the messaging relates to the brand and it’s products. We used high quality equipment to ensure the execution did not look like a a simple webcam set-up. I specced a very new camera at the time, the Canon EOS M, a very compact body with the guts of the 60d SLR. One reason I went with this camera besides it’s high quality innards was the ability to attach professional lenses (in our case a very wide angle) and more importantly, to run the hacked firmware called Magic Lantern on it. Magic Lantern is supremely important to me, it has enabled me in many ways and frankly, i would have been crippled relying on any camera manufacturers stock abilities. To get the HD output from the Canon, I used a Black Magic Intensity Pro capture card which also offered exposed drivers for our programming. Basically, this system gave us a super high-end webcam platform from which to work. the reason for the wide angle lens was because users would be standing in front of a large flatscreen tv, on which the Canon camera is mounted – too close for comfort. A 22mm portrait lens enabled us to get much of upper body without much distortion at the edges. Some of the usability rules instituted moving the messages (basically png files) and the associated graphic flourishes (png sequenced animations) around depending on the location of the face. It was important to factor this in because we had to be able to serve short and tall alike, along with calm and caffeinated users. If the user moved off to one side, we didn’t want the message to fly offscreen. A ruleset in our programming ensured the messaging was always readable onscreen. You can see soe of this behavior in the vimeo video embedded above, whereby the floating squares represent the png’s making up messages and flourishes. Sidenote: we needed to test for bugs with a very VERY compact schedule, so we tested through the night and on weekends with the help of Jamie Farr’s printed headshot on a tripod. Thanks to Jamie, we found memory leaks and plugged ‘em. Thanks to Dimitri Diakopoulos for the software assist.
I threw this paragraph at the end of the post because of its unapologetically preachy nature: Why is the opensourcing of such tools important? Well, OpenCV is not specifically about face tracking, rather, it is a collection of tools that enable face tracking, but also enable all kinds of realtime analysis of a video stream…. The implications for artificial intelligence, computer learning, and autonomous robotics is all in context. And there is often more than one way to use OpenCV within a given context. An open sourced platform has the muscle of a community behind it, rather than a team at a company on a production schedule dealing with product based specifics.