Description: Alon Baram and Laurie Bayet build upon a model of visual recognition that learns to identify digits and faces from novel viewpoints, using limited training examples of the sort that an infant may experience as it learns to recognize new faces.
Speaker: Alon Baram and Laurie Bayet
[MUSIC PLAYING]
LAURIE BAYET: My name is Laurie Bayet. I'm a postdoc at the University of Rochester and Boston Children's Hospital, and I'm working on developmental cognitive neuroscience.
ALON BARAM: My name is Alon, and I am studying currently at Oxford. I'm doing my PhD. I'm there with Professor Tim Behrens, and I'm currently working on computational cognitive neuroscience.
LAURIE BAYET: Alon and I are trying to use paper by Tomaso Poggio and Potters on a specific way to achieve invariant recognition in computer vision or other algorithm. So we're basically trying to implement this in a simpler case and then moving on to our face recognition under rotations.
ALON BARAM: The idea is that most of the variance in computer vision, when an algorithm tries to discover what is in the image, is held in very few manipulation. Like translation, which is a shifting image across a field or rotations or scaling. So Poggio has a cool idea of how to create this signature that Laurie just told about, which is invariant to these things and might reduce the sample complexity. So how many examples you need to learn.
LAURIE BAYET: For the simple case, we just used an existing data set of digits. For the face data set, we tried to find a suitable data set online, but we ended up just taking videos of people using materials provided by the summer school. So taking videos of people rotating their heads like this slowly.
ALON BARAM: Yeah, it was fun.
LAURIE BAYET: Moving around a little bit.
ALON BARAM: We have now a complete data set of the heads of people from different angles.
LAURIE BAYET: We wanted to provide the algorithm with a hopefully limited number of raw frames from people rotating their heads like this. As a template, so to speak, and act then as like a kernel so to speak, to be able then to recognize unseen people under various angles so that whenever a person is showing this profile or this profile, you would still be able to recognize it with the same level of accuracy as if they were in front of them, presenting the frontal face.
ALON BARAM: The purpose of doing this project would be, in the long run or what this iTheory as Tommy Poggio calls it will be in the long run would be to reduce the number of examples that an algorithm, for example, deep neural nets, the number of examples they need to see in order to learn their weights in order to learn how to classify images or retrieve images.
LAURIE BAYET: We haven't started the face part. We only started the digits part, which worked. So we're--
ALON BARAM: It's working basically. We hope it will also work in the endlessly more complex domain of faces.
LAURIE BAYET: Now you know.
ALON BARAM: But we're hoping.
LAURIE BAYET: We're reasonably optimistic. I don't know. We'll see.
ALON BARAM: Fingers crossed.
LAURIE BAYET: We've approached the project from pretty much very different angles but still ended up having common interests, which I guess is kind of hallmark of this summer school too. Alon has this other, very interested in the engineering problems, so to speak. So how can we achieve this with machines?
And I approached the project from a developmental perspective. So given that the current algorithms manage to do invariant face recognition based on a fairly large number of exemplars, how come infants can achieve this in a few months based on a lot of experience, but not that much-- mostly looking at their parents, caregivers, and a few other exemplars, but not like 3,000 people from all possible angles. So this is why I was very interested in this theory and trying to implement this manually has been pretty cool so far.
[MUSIC PLAYING]