Our Space Battle opens up a new frontier in the exploration for the most exciting interactive toys
What’s the future of play? How are advances in engineering changing the world of game design and toys? In an age of touch screens, 3D graphics and augmented reality, it goes without saying that our children are already assimilated to the connected world. I recently spoke to a curator for a children’s museum who reiterated the importance of promoting interactivity in design for educational curriculum to keep up with the pace of entertainment and other media that increasingly make fractal claims on our children’s attention.
“If they can’t touch it and have the object respond,” she told me, “as far as they’re concerned, it’s a poster.”
Every day at FS Studio, we try to tackle the challenges of the next generation toy. Every day we play Geppetto to our clients’ Pinnochio, trying to conjure something unique and alive out of a magical block of talking wood (in this case, plastic and silicon chips).
Introducing SPACE BATTLE, a smart toy we’re excited to debut on the Artik platform. When we prototyped SPACE BATTLE, we wanted to enhance a toy that by all appearances looked like a regular spaceship toy but with the latest capabilities in AI. Equipped with an optical sensor, SPACE BATTLE uses computer vision to respond to its environment. We also wanted to introduce an engaging storyline that facilitates participation and wonder. In other words, the toy plays with you.
We wanted SPACE BATTLE to have all the autonomy and portability of a regular toy that a child can carry around anywhere regardless of connectivity. The results of our experiment are a window into the ever expanding world of “enchanted” or smart devices. The evolution from static “dumb” toys to a future of interactive toys, toys that are fully embedded with sophisticated deep learning capabilities and our own proprietary mix of technology and fun that allows them to perform complicated tasks while running completely offline (no Internet required!).
How did we do it?
The challenge, of course, was optimization. For the toy to be feasible, it had to run a real-time image recognition process that works on low-powered devices. That’s where the Artik comes in. Its quadcore processors were more then capable. We were able to equip SPACE BATTLE with a convolutional neural net, or “deep net,” that allowed the rocket ship to come alive in yours hands.
If tech talk makes your eye glaze over, feel free to jump to the end, if you want to get a peek into the magic that made this happen, read on!
Starts with the Dataset
So what’s happening under the proverbial hood, lets start with the dataset creation. We undertook the task of “training” our neural network with tens of thousands of images. This was done with a blend of both real images and a synthetically generated image dataset. The real images are absolutely the best source of data for training the neural net, you get natural lighting, backlighting, shadows, and all the subtleties that you can miss in a synthetically generated dataset. However the synthetic dataset allowed us to create a vast amount of data to augement these real images and this helped train the neural net with offset images, rotated images, varied image sizes, and many many more backgrounds. These synthetic images were created in a 2D image environment, we are looking to creating tools that use OpenGL and 3D environments for synthetic image creation, which get us closer to real images especially when it comes to things like lighting, camera placement, and foreground occlusion.
The key to modern deep learning techniques is the use of Stochastic Gradient Descent (SGD). Then we tweak, tweak, tweak the training hyperparameters, use of mini-batch sizes, number of training iterations, steps sizes, alpha, on and on and on (we also tweak the network similarly).
How do we verify our results of all this tweaking, two ways. We set aside a portion of our training data for verification. This gives us a measure of confidence in our accuracy, but to be sure using the training data and getting good results against those is no guarantee that we will get similar results in the real-world. That’s our final litmus test, getting out there and brute force manual testing in the real world. We get this into the hands of as many folks as we can and see what our results are, this is decidedly qualitative but it’s the best overall measure of our success.
So the secret sauce to this whole endeavor is in the DeepNet network design. Since we’re working on a constrained system, Artik 10 (Cortex A-series quad-core), running image recognition in real-time, we had to constrain the DeepNet’s size, both in depth and network width. What we’ve found with this process is that there’s an element of alchemy and heuristics both in network design as well as dataset creation. But in the end you can prune an aweful lot of the network and still get great results in accuracy and hit the performance requirements needed to run in realtime on an embedded system.
We played with how many convolutional kernels (width of the network), less kernels means less “features” it can discover, but we’ve found that unintuitively, you can actually get better results. The number of layers or the depth of the CNN is also drastically reduced to what you’d find in a larger server based or unconstrained solutions. The size of the convolution kernel is also surprisingly small as well.
On top of all of this, we have an Artificial Neural Net (ANN) for the final classification. For the activation functions, we use Rectification for the CNN and for the ANN fully connected layers, we will often play around with various activation functions, this is highly dependent on the application itself.
The result? We have a highly accurate real-time image recognition system that works offline on highly constrained devices. If you want to add hand signals to a camera to control camera functionality, do you want to add event triggers to low-cost toys, you name it!