FS Studio Logo

By Bobby Carlton

Robotics, AI, and a slew of other cutting-edge technologies will re-shape our world. But how soon will that happen?

Caio Viturino works here at FS Studio as a Simulations Developer and is incredibly focused and passionate about how Robotics and Artificial Intelligence will change everything from warehouse automation to having an impact on our everyday life. His robotics journey started when he was as an undergraduate in Mechatronics Engineering between 2010 and 2015.

Along with his amazing work here at FS Studio, Viturino is also a PhD student at the Electrical Engineering Graduate Program at the Federal University of Bahia (UFBA) in Salvador, Brazil, supervised by Prof. Dr. André Gustavo Scolari Conceição and a researcher at the Laboratory of Robotics at UFBA.

With the current state of how industries are looking at robots and AI to play a crucial role in how we work and socialize, we thought it would be important to learn more about what it is that he does and dig into some questions of technologies and where he thinks it is heading.

During our interview Viturino first explains how he ended up on this path with robotics saying, "Shortly after my bachelor's degree, I started a master's degree in Mechatronics Engineering in 2017 with an emphasis on path planning for robotic manipulators. I was able to learn about new robotic simulators during my master's, like V-REP and Gazebo, and I also got started using Linux and Robot Operating System."  

Caio Viturino and his Robot

In 2019 Viturino started a Ph.D. in Electrical Engineering with a focus on robotic grasp. He primarily used ROS (robot operating system) to work with UR5 from Universal Robots and Isaac Sim to simulate the robotic environment. "In my work, I seek to study and develop robotic grasping techniques that are effective with objects with complex geometries in various industrial scenarios, such as bin picking."  

The Tools and Why

At first, Viturino was hired as a consultant here at FS Studio in July of 2022 to work on a project for Universal Robots using Isaac Sim. After the conclusion of this work, he was hired to work on projects involving artificial intelligence and robotics that are related to scenario generation, quadruped robots, and robotic grasping. 

He tells me that he primarily uses the following for most of his research:

Pybullet - An easy to use Python module for physics simulation, robotics and deep reinforcement learning based on the Bullet Physics SDK. With PyBullet you can load articulated bodies from URDF, SDF and other file formats.

Isaac Sim - A scalable robotics simulation application and synthetic data generation tool that powers photorealistic, physically-accurate virtual environments to develop, test, and manage AI-based robots.

Isaac Gym - provides a basic API for creating and populating a scene with robots and objects, supporting loading data from URDF and MJCF file formats.  

I asked Viturino about his current work on PyBullet, Isaac SIM, quadrupeds learning to walk. Why is this work important to him and why are robotics important in general?

"Robots will not be a replacement for the human labor force but will aid in difficult or repetitive tasks," said Viturino. Just recently Amazon announced their new AI powered robot called Sparrow, designed to do exactly what Viturino is saying here.

He then tells me that for these robots to perform these tasks, it is necessary to develop their cinematic and dynamic models, and test route planning algorithms so that the robot can go from point A to point B while avoiding static and dynamic obstacles, among other difficult tasks.  

These algorithms will require time and significant investment to implement in real-world scenarios. Robotic simulators will lower these costs and risks by enabling all of these algorithms to be tested in simulation before being implemented on actual hardware.  

NeRF Drums

In a previous post on the FS Studio blog, Viturino and I talked about NeRFs. One question I had for him was how will NeRFs and robotics combined change the world of automation, and is there a way to speed up the creation of robotic SIM?

"Robotic simulations are being used more frequently as a means of training and testing mobile robots before deploying them in the real world. This is known as sim2real. For instance, we could create a 3D model of a warehouse and then train various robots in that environment to plan routes, recognize objects, and avoid collisions with dynamic obstacles."  

One thing to mention is that the process isn't that simple. Yes, NeRFs can help a lot in this regard since we may easily and quickly obtain a 3D model of the surrounding area but modeling an environment can take a lot of time and money.

Robotics with Grasping, Trajectory Planning and Deep Learning

When asked about his passion and how robotic grasp objects, trajectory planning and Deep Learning. Viturino tells me that Deep Learning enables the practical and effective use of several algorithms that are only effective in specific environments or situations. For instance, a classic robotic grasping algorithm needs physical properties of objects, such as mass and dynamic and static attributes, to work. These properties are impossible to obtain when considering unknown objects.    

Artificial Intelligence allows robots to perform grasping tasks without worrying about physical properties that are difficult to obtain. These algorithms are getting better at working with any object and in every environment.     

However, there is a lot to be explored in order to find a complete solution for all robotic problems, or to put it another way, a unique algorithm that plans routes, executes preening, identifies obstacles, among other things, in the style of DeepMind. In addition, the computational performance or reliability of these algorithms still limits their practical use. Viturino explains that the gap between industry and academia has been closing significantly over the past few years.

How Far Are We from Robotic Help and Companionship

When we think of modern day robots that can be used in our normal everyday life, we think of things such as an iRobot Roomba vacuum to keep our floors clean or something like Piaggio My Gita robot that follows you around and does things like carry groceries or your computer. But truthfully we all would love for the day where we can have our own astromech droid like R2-D2 to be our on-the-fly problem solver and companion throughout the day. I asked Viturino about this. How far are we from this?

"I think we have a gap where the pieces still don't fit together completely. Imagine that each component of this puzzle is a unique algorithm, such as an algorithm for understanding emotions, another for identifying people, controlling the robot's movements, calculating each joint command, and determining how to respond appropriately in each circumstance, among others."

According to Viturino, the pieces still need to be very carefully built and be developed so they can be assembled and then have them all fit together perfectly. "I think we won't be too far from seeing such in sci-fi movies given the exponential growth of AI in the last decade." Granted, we won't get something like R2-D2 anytime soon, but you could paint a My Gita robot to look like R2!

But it does take me to my next question. I personally own an Anki Vector AI robot. He's been in my family since the beginning and we've all come to really love Vector's presence in the house. I wanted to know Viturino's thoughts on more robotics like Vector, Roomba and My Gita becoming more popular as a consumer product.

He explains that this greatly depends on how well-received this type of technology is accepted by the general public. The younger generation is more receptive to this technology. Price and necessity are also important considerations while purchasing these robots.  

Viturino then says that the robotics community will need to demonstrate that these robots are necessary, much like our cellphones, and are not just a novelty item for robotics enthusiasts like us. This technology should be democratized and easily accessible to all. 

A company in Brazil by the name of Human Robotics is heavily focused on building robots for commercial use in hotels and events as well as domestic use, such as caring for and monitoring elderly people. However, he doesn't think the population is completely open for this technology.  

He's right, there's still some hesitation on using robots for daily tasks, but there is some traction.

AI, SLAM, LiDAR, Facial Tracking. Body Tracking. What Else Will Be Part of the Robotic Evolution?

Viturino focuses on one part of this question, saying that he thinks that as artificial intelligence advances, we will use simpler sensors. Today, it is already possible to create a depth image with a stereo RGB camera. Or perhaps synthesizing new views from sparse RGB images (NeRF). But he believes that the day will come when we will only need a single camera to get all data modalities.  

"There are other technologies, particularly in autonomous vehicles, such as passive thermal cameras. Despite it, the technology is restricted by armies and governments, and the cost is high. However, it may be a promise for the future."

As we come to the end of our conversation one thing Viturino brings up is he believes that simulation allows us to develop, test, and go beyond imagination, without fear of damaging robots and stuff, which can cost a lot of money or dismissal and an unpayable fine, depending on the damage haha. After we've tested our ideas in the simulation, then we're ready to deploy the software in the hardware.  

As for his work in robotics and AI, and closing the gap of what's possible now and the future of what we hope for, he believes that NVIDIA is working to develop ever-more accurate simulations through the use of their PhysX library, which is now available as an open-source version 5.1. As a result, the gap between simulation and reality will close more and more, increasing the reliability of robotic applications.  

"We are in an era where we must be bold and creative to overcome the limits already reached, with agility and teamwork."  

You can learn more about Caio and his work by checking out his Github page.

By Bobby Carlton

With its network perfectly synchronized with the real world, Digital Schiene Deutschland (Digital Rail for Germany, DSD) can run optimization tests and “what if” scenarios to test and validate changes in the railway system, such as reactions to unforeseen situations.

The German railway company, Deutsche Bahn is building a digital twin of its railway network that will allow them to monitor and improve the performance of its 20,500 miles of tracks and stations. Through an interconnected network of sensors and cameras and AI through Nvidia Omniverse, the railway can analyze the data collected by its sensors and cameras to identify the causes of its various operational issues and improve its performance.

Deutsche Bahn
Image from Nvidia

A digital twin can provide you with a quick overview of what's going wrong, but it can also help you prevent it. With the help of AI, you can learn how to fix issues and make the whole system work better. For instance, an AI can analyze a process and uncover design flaws and identify the cause of why it's happening. It can also help you schedule regular inspections and maintenance on certain parts of the machinery through predictive maintenance.

“With NVIDIA technologies, we’re able to begin realizing the vision of a fully automated train network,” said Ruben Schilling, who leads the perception group at DB Netz, part of Deutsche Bahn in an official Nvidia press release. "The envisioned future railway system improves the capacity, quality and efficiency of the network."

That said, it’s important to not underestimate the real-time aspect of AI’s role with digital twinning in industry 4.0. According to David Crawley, a professor at the University of Houston's College of Technology, the university collaborated with other organizations to develop a digital twin that can be used in its digital oilfield laboratory.

He noted that an oil rig worker in the South Pacific was able to use AR headgear to show an engineer how to fix a faulty part of the equipment without shutting down the operations.

According to Crawley, the use of AI in the metaverse allows people to engage in activities that are similar to what they're actually doing in the real world using a AR, VR, or WebXR. For instance, a worker hundreds of miles away can use a device like a Magic Leap 2 headset to fix a pipe or identify a problem with a valve.

There's also a symbiotic relationship between AI and digital twins that exists in an industrial metaverse.

“AI is ultimately the analysis of data and the insights we draw from it,” Lisa Seacat DeLuca, then a distinguished engineer and director of Emerging Solutions at IBM, during an interveiw with VentureBeat. “The digital twin provides a more effective way to monitor and share that data, which feeds into AI models. I boldly state that you can’t have AI without digital twins because they can bring users closer to their assets and enable them to draw better, more accurate and more useful insights.”

A digital twin can be built using the data collected by various sensors and devices and IOT. Aside from providing more data points, the digital twin can also help improve the AI's performance by allowing it to perform more effective simulations.

Deutsche Bahn Chief Technology Innovation Officer Rolf Härd, noted that the company can collect enough data to allow its AI to perform more impactful simulations and provide predictions that will help Deutsche Bahn be more efficient.

David Crawley explained how a digital twin can be used to perform predictive maintenance analyses on a trains components, and noted that because of his knowledge of how these components work, he can use the digital twin to model maintenance scenarios.

When creating a digital twin at such a large scale, the process can become a massive undertaking. You need a strategy and a roadmap to a custom-built 3D pipeline that connects computer-aided design datasets that are built within your ecosystem with high-definition 3D maps and various simulation tools. In this case Deutsche Bahn used Nvidia's Universal Scene Description 3D framework, to connect and combine data sources into a single shared virtual model.

Through digital twinning and data collected by the IoT sensors, Crawley and his team were able to identify areas where his organization can improve its operations. For instance, by analyzing the speed and weather of a train, he was able to identify where Deutsche Bahn could improve its service to its customers.

by Bobby Carlton and Dilmer Valecillos

Unity, MRTK, Needle Tools and 8th Wall are just some of the tools you'll need to develop!

The Meta Quest Pro is available now and we have already seen some very cool things being teased from developers leading up to its launch. Of course, in order to develop, you need an amazing set of tools. Our Head of R&D, Dilmer Valecillos took a moment to take a deep dive into some of the top development tools you can use with your Quest Pro headset to develop and launch your own XR experiences.

Image from Meta

In the video, Dilmer gives us a little bit of comparison of between the Quest Pro and the recently released Magic Leap 2 MR headset. He also gives us some perspective on color passthrough and what that will mean not only for the Meta Quest Pro, but for XR in general.

Some of the tools mentioned here are Unity, Unreal, MRTK 3, Needle Tools, 8th Wall, and Mozilla, and a brief skimming of how to use them for the Quest and deploy your builds.

All of these tools along with others will be essential for you to develop incredible XR experiences on the Quest Pro headset that you can bring into your workforce as a training platform, or used for social events and entertainment.

You can expect an even deeper dive into the Meta Quest Pro in upcoming posts here on our blog and on our YouTube page.

By Bobby Carlton

New technologies will allow people to interact with the world around them in various ways.

For some time now, people have been captivated by the notion of how new technologies will change the way we work, socialize, seek out entertainment, and approach education. This has led to the development of new ideas about how to create a better computing system for todays digitally connected world. Web 3.0 and spatial computing are innovations that will allow people to bring that vision to life.

Although some argue that Web 3.0 is here thanks to AR/VR technologies, others feel it's still in development but just around the corner. The fact is that the core components of Web 3.0 are here thanks to innovations such as AI, blockchain, VR/AR, IoT, and 5G.

Web 3.0 aims to drastically expand the utility of the internet, which has evolved from its text-based origins to a more interactive and socially consumed form of content creation. These technologies will allow people to experience a more intelligent and user-friendly digital world.

Web 3.0

Despite the technological advancements that have occurred over the past few years, the user experience on the web has always been this 2D experience. XR (VR/AR) and other similar technologies will allow people to experience a more accurate, interactive, and user-friendly digital world.

The spatial computing is a project that aims to digitize our 2D content and turn them into 3D worlds to transform them into a digital twin that's more accurate and user-friendly. The idea is that will allow people to interact with the virtual world around them through VR/AR and AI.

There are a lot of different names for this approach. The most popular has been the metaverse. However there are a number of other terms being used. Here are a just a few:

A report released in 2020 by Deloitte stated that the spatial web ( the term used by Deloitte) is the next evolution in information technology and computing. It is similar to the development of Web 1.0 and 2.0, and it will eliminate the boundary between physical objects and digital content.

Image from Forbes

The term spatial refers to the idea that digital information will be integrated into your physical real-world space, which is an inseparable part of the physical world.

In an article published on the Singularity Hub, Peter Diamandis, a prominent Silicon Valley investor, stated that the world will be filled with rich, interactive, and dynamic data over the next couple of years. This will allow people to interact with the world around them in various ways. The article also noted that the spatial web will transform various aspects of our lives, including education, retail, advertising, and social interaction.

The spatial web is built on the various technological advancements that have occurred over the past few years, such as Artificial Intelligence (AI), VR, blockchain, and IoT. These technologies are expected to have a significant impact on the development of the digital world and Web 3.0.

The four major trends that are expected to have a significant impact on the development of the digital world are expected to combine to create a single meta trend. This will allow computing to move into the space between the physical and digital worlds. This will allow future systems to interact with the world around them in various ways.

The various sensors and robotic systems that will be used in the virtual worlds will be able to collect and store data in order to support the spatial web. This will allow users to interact with the world around them in various ways. At the same time, the data collected and stored by these systems will be used to create reports and other applications that will allow individuals to interact with the world around them, and provide businesses with data-rich KPIs.

For instance, in the warehousing industry, traditional methods of picking and transporting orders have been used to successfully accomplish the task of navigating through millions of square feet of warehouse space. With the increasing number of websites that promise next day delivery, warehouses are constantly looking for new ways to improve their efficiency.

Through the use of robotics, automation, and various data points such as the location of cameras and sensors, the system can create 3D maps of warehouses. It can also suggest the ideal warehouse layout based on the data collected by its human workers, create "what if" scenarios, create improved employee training, uncover "hidden factories", and streamline workflow. This method can increase efficiency by up to 50%.

Another positive is that the use of this technology can help companies reduce their turnover rate and improve their employee satisfaction. It can also help them increase their self-worth by allowing them to perform their job more efficiently.

Although the spatial web is only a small glimpse of the potential of the future for business, it is still important to note that the various technologies that are currently being developed are still in their early stages of development. In a Baystreet article, the concept of the smart world of tomorrow relies on the four lenses that are designed to create a seamless and harmonious interaction between man and machine.

The spatial web is a framework that aims to enable the interoperability of various sub sectors and technologies. It can help create a network where all of these technologies can work together seamlessly. This will allow the ideologies of Human 2.0, Society 5.0, Industry 4.0, and Web 3.0 to transform into reality.

Smart factories will allow workers to collaborate and create a virtual environment where they can work together seamlessly. This can help them improve their efficiency and create a more harmonious environment for their customers. With the help of technology such as AR and VR, you can take a full-scale model of your company's product and visualize its various components in a room.

After the design has been created, it will be made available to the machines and robotic systems that will be used to create a digital twin. Combined with the use of AI and other advanced technologies, these systems will be able to track and automate the various parts of the product as it moves through your factory.

Image from FS Studio

From there, it becomes a cascade of efficiencies. Your products will be loaded and delivered to your customers on time thanks to automation, robotics, and a well trained workforce.

Retail outlets will use Web 3.0, XR tools and digital information to create an improved shopping experience in their stores. Through the use of smart mapping and routing technology, they will be able to improve the efficiency of the shopping experience that will allow them to map out the path that is most likely to lead to their desired items, or help them with product placement.

We are not totally there just yet but as technologies improve, we are seeing more adoption in many industries. Yes, there is an investment up front to you into Web 3.0 and spatial computing, but by educating yourself in digital twinning, the metaverse, real to life virtual spaces - or whatever you'd like to call it - employee safety, improved workflow, and ROI is what stands out.

Lets Talk Simulation with NeRFs

By Caio Viturino and Bobby Carlton

As companies and industries uncover the potential of the metaverse and digital twinning, and leverage it to streamline their workforce, improve employee training, embrace the automation of warehouses and much more, they will need a process that would allow them to quickly and easily create 3D content. This is especially important since the creation of virtual worlds and complex content will become more prevalent for businesses moving forward.

One way of speeding up this process is through something called Neural Radiance Field (NeRF), and this process can help us create and launch 3D digital solutions that can be used in a wide variety of Enterprise case uses. However, there are some questions about the technology. 

What is NeRF? 

NeRFs are neural representations that represent the geometry of complex 3D scenes. Unlike other methods, such as point clouds and voxel models, they are trained on dense photographic images. They can then produce photo-realistic renderings which can be used in various ways for digital transformation. 

This method combines the power of a sparse set of input views with an underlying continuous scene function to generate novel views of complex scenes, and can be taken from a static set of images or something like a blender model. 

In a Medium post by Varun Bhaseen, he describes NeRFs as a continuous 5D function outputs the radiance direction (θ; Φ) at each point (x; y; z) in space, and it has a density that acts like a differential opacity to determine how much energy is collected by a ray passing through (x; y; z). 

Bhaseen explains it further with a the visual below showing the steps that are involved in optimizing a continuous 5D model for a scene. It takes into account the various factors that affect the view-dependent color and volume density of the scene. In this example, the 100 images were taken as input. 

NeRF Drums
Image from Medium/Varun Bhaseen

This optimization is performed for a deep multi-layer perceptron, without using any additional layers. To minimize the error between the views that are rendered from the representation and the observed images, gradient descent is used.

Can We Reconstruct the Environment Using Some Equipment?

We can! In addition to being able to model the environment in 6 minutes, the equipment from Mosaic can also generate high-quality 3D models.

Unfortunately, this method is very expensive and requires a lot of training to achieve a high-quality mesh. AI-based methods, on the other hand, seem to do this using a cellphone camera. Another option that could be very useful is NeRFs.

Who First Developed the Well-Known NeRF? 

The first NeRF was published in 2020 by Ben Mildenhall. This method achieved state-of-the-art results in 2020 when synthesizing novel views of complex scenes from multi-RGB images. The main drawback then was the time it took for training the network which was almost 2 days per scene, sometimes more, considering Mildenhall was using a NVIDIA V100 GPU.   

Why NeRF is Not Well Suited for Mesh Generation? 

Unlike surface rendering, NeRF does not use an explicit surface representation, instead it focuses on objects in a density field. This method, unlike surface point for rendering, takes into account multiple locations in a volume in order to determine the color. 

NeRF is capable of producing high-quality images, but the surfaces that are extracted as level sets are not ideal. This is because NeRF does not take into account the specific density levels that are required to represent the surface. 

In a paper released by Nvidia, they introduced a new method called instant NeRF, which can generate a high-quality image of a density and radiance-and-density field. Unfortunately, this method was not able to produce good meshes as well. The meshes generated through this approach did produce a decent volumetric radiance and density field, however they seemed "noisy". 

What If We Use Photogrammetry Instead?

Unlike photogrammetry, NeRF does not require the creation of point clouds, nor does it need to convert them to objects. Its output is faster, but unfortunately the mesh quality is not as good. In the example here, Caoi Viturino, Simulations Developer for FS Studio, tested the idea of generating meshes of an acoustic guitar from the NeRF volume rendering by using the Nvidia NeRF instant. The results are pretty bad with lots of "noise".

Image by Caio Viturino

Viturino also tried to apply photogrammetry (using a simple cell phone camera) through existing software to compare with NeRF mesh output using the same set of images. We can see that the output looks better but NeRF can capture more details of the object.  

Image by Caio Viturino

Can NeRF Be Improved to Represent Indoor Environments?

In a paper released by Apple, the team led by Terrance DeVries explained how they were able to improve the NeRF model by learning to decompose large scenes into smaller pieces. Although they did not talk about surface or mesh generation, they did create a global generator that can perform this task.

Unfortunately, the algorithm's approach to generating a mesh is not ideal. The problem with NeRF is that the algorithm generates a volumetric radiance-and-density field instead of a surface representation. Different approaches tried to generate a mesh from the volumetric field, but it was for single objects only (360 degrees scan):

Can NeRF Be Improved to Generate Meshes?

It is well known that NeRF does not admit accurate surface reconstruction. Therefore, some suggest that the algorithm should be merged with implicit surface reconstruction.

Michael Oechsle (2021) published a paper that unifies volume rendering and implicit surface reconstruction and can reconstruct meshes from objects more precisely if compared to NeRF. However, the method needs to be applied to single objects instead of scene reconstruction.

Do We Really Need a Mesh of the Scene or Can We Use the Radiance Field Instead?

NeRF is more accurate than point clouds or voxel models when it comes to surface reconstruction. It does not need to perform precise feature extraction and alignment.

Michal Adamkiewicz used the NeRF to perform a trajectory optimization for a quadrotor robot in the radiance field produced by NeRF instead of using the 3D scene mesh. The NeRF environment used to test the trajectory planning algorithms was generated from a synthetic 3D scene.

Unfortunately, it is not easy to create a mesh from the NeRF environment. To load the scene into Isaac Sim, we need a mesh representation of the NeRF.

Can We Map an Indoor Environment Using NeRF?

According to a report written by Xiaoshuai Zhang (2022), not yet. “While NeRF has shown great success for neural reconstruction and rendering, its limited MLP capacity and long per-scene optimization times make it challenging to model large-scale indoor scenes.”

The goal of Zhang’s paper is to incrementally reconstruct a large sparse radiance field from a long RGB image sequence (monocular RGB video). Although impressive and promising, 3D reconstruction from RGB images does not seem to be satisfactory yet. We can observe noise in the mesh produced by this method.

What If We Use RGB-D Images Instead of RGB Images?

Dejan Azinović (2022) proposed a new approach to generating 3D reconstruction of scenes that is much better than NeRF.

The image below shows how noisy the 3D mesh generated by the first proposed NeRF is compared to the Neural RGB-D surface reconstruction.

Enter the SNeRF!

A recent study conducted by Cornell University revealed that creating a variety of dynamic virtual scenes using neural radiance fields can be done at a speed that is more than enough to handle the complexity of complex content. This is a stylized neural radiance field (SNeRF).

Led by researchers Lei Xiao, Feng Liu, and Thu Nguyen-Phuoc, the team was able to create 3D scenes that can be used in various virtual environments simply by using SNeRF to adapt to the real-world environment and then use points to create the virtual scene. Imagine looking at a painting and then seeing the world through the lens of the painting.

What Can SNeRFs Do?

Through their work, they were able to create 3D scenes that can be used in various virtual environments. They were also able to use their real-world environment as a part of the creation process.

The researchers were able to achieve this by using cross-view consistency, which is a type of visual feedback that allows them to observe the same object at different viewing angles, creating an immersive 3D effect.

They were able to create an immersive 3D effect by using cross-view consistency. This type of visual feedback allowed them to observe the same object at different viewing angles.

The Cornell team was also able to create an image as a reference style and then use it as a part of their creation process by alternating the NeRF and the stylization optimization steps. This method allowed them to quickly create a real-world environment and customize the image.

“We introduce a new training method to address this problem by alternating the NeRF and stylization optimization steps,” said the research team in their published paper. “Such a method enables us to make full use of our hardware memory capacity to both generate images at higher resolution and adopt more expressive image style transfer methods. Our experiments show that our method produces stylized NeRFs for a wide range of content, including indoor, outdoor and dynamic scenes, and synthesizes high-quality novel views with cross-view consistency.”

The researchers had to address another issue with the NeRF memory limitations, which they had to solve in order to render more high-quality 3D images at a speed that felt like real-time. This method involved creating a loop of views that would allow them to target the appropriate points in the image and then rebuild it with more detail.

Can SNeRF Help Avatars?

Through this approach, Lei Xiao, Feng Liu, and Thu Nguyen-Phuoc were able to create expressive 4D avatars that can be used in conversations. They were also able to create these avatars by using a distinct style of NeRF that allows them to convey emotions such as anger, confusion, and fear.

Currently the work being done by the Cornell research team on 3D scene stylization is still ongoing. They were able to create a method that uses implicit neural representations to affect the avatars' environment. They were also able to take advantage of their hardware memory's capabilities to create high-resolution images and adopt more expressive methods in virtual reality. 

However, this is just the beginning and there is a lot more work and exploration ahead.

If you’re interested in diving deeper into the Cornell research teams work, you can access their report here.

Jens Huang talks about the future of AI, robotics, and how NVIDIA will lead the charge.

By Bobby Carlton

A lot was announced and I did my best to keep up! So let's just jump right in!

NVIDIA CEO Jens Huang unveiled new cloud services that will allow users to run AI workflows during his NVIDIA GTC Keynote. He also introduced the company's new generation of GeForce RTX GPUs.

During his presentation, Jens Huang noted that the rapid advancements in computing are being fueled by AI. He said that accelerated computing is becoming the fuel for this innovation.

He also talked about the company's new initiatives to help companies develop new technologies and create new experiences for their customers. These include the development of AI-based solutions and the establishment of virtual laboratories where the world's leading companies can test their products.

The company's vision is to help companies develop new technologies and create new applications that will benefit their customers. Through accelerated computing, Jens Huang noted that AI will be able to unlock the potential of the world's industries.


The New NVIDIA Ada Lovelace Architecture Will Be a Gamer and Creators Dream

Enterprises will be able to benefit from the new tools that are based on the Grace CPU and the Grace Hopper Superchip. Those developing the 3D internet will also be able to get new OVX servers that are powered by the Ada Lovelace L40 data center. Researchers and scientists will be able to get new capabilities with the help of the NVIDIA LLMs NeMo Service and Thor, a new brain with a performance of over 2,000 teraflops.

Jens Huang noted that the company's innovations are being put to work by a wide range of partners and customers. To speed up the adoption of AI, he announced that Deloitte, the world's leading professional services firm, is working with the company to deliver new services based on the NVIDIA Omniverse and AI.

He also talked about the company's customer stories, such as the work of Charter, General Motors, and The Broad Institute. These organizations are using AI to improve their operations and deliver new services.

The NVIDIA GTC event, which started this week, has become one of the most prominent AI conferences in the world. Over 200,000 people have registered to attend the event, which features over 200 speakers from various companies.

A ‘Quantum Leap’: GeForce RTX 40 Series GPUs


NVIDIA's first major event of the week was the unveiling of the new generation of GPUs, which are based on the Ada architecture. According to Huang, the new generation of GPUs will allow creators to create fully simulated worlds.

During his presentation, Huang showed the audience a demo of the company's upcoming game, which is called "Rover RTX." It is a fully interactive simulation that uses only ray tracing.

The company also unveiled various innovations that are based on the Ada architecture, such as a Streaming Multiprocessor and a new RT Core. These features are designed to allow developers to create new applications.

Also introduced was the latest version of its DLSS technology, which uses AI to create new frames by analyzing the previous ones. This feature can boost game performance by up to 4x. Over 30 games and applications have already supported DLSS 3. According to Huang, the company's technology is one of the most significant innovations in the gaming industry.

Huang noted that the company's new generation of GPUs, which are based on the Ada architecture, can deliver up to 4x more processing throughput than its predecessor, the 3090 Ti. The new GeForce RTX 4090 will be available in October. Additionally, the new GeForce RTX 4080 is launching in November with two configurations.

  1. The 16GB version of the new GeForce RTX 4080 is priced at $1,199. It features 9,728 CUDA cores and 16 GB of high-speed GDDR6X memory. Compared to the 3090 Ti, the new 4080 is twice as fast in games.
  2. The 12GB version of the new GeForce RTX 4080 is priced at $899. It features 7,680 CUDA cores and 12 GB of high-speed GDDR6X memory. DLSS 3 is faster than the 3090 Ti, making it the most powerful gaming GPU available.

Huang noted that the company's Lightspeed Studios used the Omniverse technology to create a new version of Portal, one of the most popular games in history. With the help of the company's AI-assisted toolset, users can easily up-res their favorite games and give them a physical accurate depiction.

NVIDIA Lightspeed Studios used the company's Omniverse technology to create a new version of Portal, which is one of the most popular games in history. According to Huang, large language models and recommender systems are the most important AI models that are currently being used in the gaming industry.

He noted that recommenders are the engines that power the digital economy, as they are responsible for powering various aspects of the gaming industry.

The company's Transformer deep learning model, which was introduced in 2017, has led to the development of large language models that are capable of learning human language without supervision.

Image from NVIDIA

“A single pre-trained model can perform multiple tasks, like question answering, document summarization, text generation, translation and even software programming,” said Huang.

The company's H100 Tensor Core GPU, which is used in the company's Transformer deep learning model, is in full production. The systems, which are shipping soon, are powered by the company's next-generation Transformer Engine.

“Hopper is in full production and coming soon to power the world’s AI factories."

Several of the company's partners, such as Atos, Cisco, Fujitsu, GIGABYTE, Lenovo, and Supermicro, are currently working on implementing the H100 technology in their systems. Some of the major cloud providers, such as Amazon Web Services, Google Cloud, and Oracle, are also expected to start supporting the H100 platform next year.

According to Huang, the company's Grace Hopper, which combines the company's Arm-based CPU with Hopper GPUs, will deliver a 7x increase in fast-memory capacity and a massive leap in recommender systems, weaving Together the Metaverse, L40 Data Center GPUs in Full Production

During his keynote at the company's annual event, Huang noted that the future of the internet will be further enhanced with the use of 3D. The company's Omniverse platform is used to develop and run metaverse applications.

He also explained how powerful new computers will be needed to connect and simulate the worlds that are currently being created. The company's OVX servers are designed to support the scaling of metaverse applications.

The company's 2nd-generation OVX servers will be powered by the Ada Lovelace L40 data center GPUs. Thor for Autonomous Vehicles, Robotics, Medical Instruments and More.

Today's cars are equipped with various computers, such as the cameras, sensors, and infotainment systems. In the future, these will be delivered by software that can improve over time. In order to power these systems, Huang introduced the company's new product, called Drive Thor, which combines the company's Grace Hopper and the Ada GPU.

The company's new Thor superchip, which is capable of delivering up to 2,000 teraflops of performance, will replace the company's previous product, the Drive Orin. It will be used in various applications, such as medical instruments and industrial automation.

3.5 Million Developers, 3,000 Accelerated Applications

According to Huang, over 3.5 million developers have created over 3,000 accelerated applications using the company's software development kits and AI models. The company's ecosystem is also designed to help companies bring their innovations to the world's industries.

Over the past year, the company has released over a hundred software development kits (SDKs) and introduced 25 new ones. These new tools allow developers to create new applications that can improve the performance and capabilities of their existing systems.

New Services for AI, Virtual Worlds

Image from FS Studio

Huang also talked about how the company's large language models are the most important AI models currently being developed. They can learn to understand various languages and meanings without requiring supervision.

The company introduced the Nemo LLM Service, a cloud service that allows researchers to train their AI models on specific tasks, and to help scientists accelerate their work, the company also introduced the BioNeMo LLM, a service that allows them to create AI models that can understand various types of proteins, DNA, and RNA sequences.

Huang announced that the company is working with The Broad Institute to create libraries that are designed to help scientists use the company's AI models. These libraries, such as the BioNeMo and Parabricks, can be accessed through the Terra Cloud Platform.

The partnership between the two organizations will allow scientists to access the libraries through the Terra Cloud Platform, which is the world's largest repository of human genomic information.

During the event, Huang also introduced the NVIDIA Omniverse Cloud, a service that allows developers to connect their applications to the company's AI models.

The company also introduced several new containers that are designed to help developers build and use AI models. These include the Omniverse Replicator and the Farm for scaling render farms.

Omniverse is seeing wide adoption, and Huang shared several customer stories and demos:

  1. Lowe's is using Omniverse to create and operate digital twins of its stores.
  2. The $50 billion telecommunications company, Charter, which is using the company's AI models to create digital twins of its networks.
  3. General Motors is also working with its partners to create a digital twin of its design studio in Omniverse. This will allow engineers, designers, and marketers to collaborate on projects.
Image from Lowes

The company also introduced a new Nano for Robotics that can be used to build and use AI models.

Huang noted that the company's second-generation processor, known as Orin, is a homerun for robotic computers. He also noted that the company is working on developing new platforms that will allow engineers to create artificial intelligence models.

To expand the reach of Orin, Huang introduced the new Nano for Robotics, which is a tiny robotic computer that is 80x faster than its predecessor.

The Nano for Robotics runs the company's Isaac platform and features the NVIDIA ROS 2 GPU-accelerated framework. It also comes with a cloud-based robotics simulation platform called Iaaac Sim.

For developers who are using Amazon Web Services' (AWS) robotic software platform, AWS RoboMaker, Huang noted that the company's containers for the Isaac platform are now available in the marketplace.

New Tools for Video, Image Services

According to Huang, the increasing number of video streams on the internet will be augmented by computer graphics and special effects in the future. “Avatars will do computer vision, speech AI, language understanding and computer graphics in real time and at cloud scale."

To enable new innovations in the areas of communications, real-time graphics, and AI, Huang noted that the company is developing various acceleration libraries. One of these is the CV-CUDA, which is a cloud runtime engine. The company is also working on developing a sample application called Tokkio that can be used to provide customer service avatars.

Deloitte to Bring AI, Omniverse Services to Enterprises

In order to accelerate the adoption of AI and other advanced technologies in the world's enterprises, Deloitte is working with NVIDIA to bring new services built on its Omniverse and AI platforms to the market.

According to Huang, Deloitte's professionals will help organizations use the company's application frameworks to build new multi-cloud applications that can be used for various areas such as cybersecurity, retail automation, and customer service.

NVIDIA Is Just Getting Started

During his keynote speech, Huang talked about the company's various innovations and products that were introduced during the course of the event. He then went on to describe the many parts of the company's vision.

“Today, we announced new chips, new advances to our platforms, and, for the very first time, new cloud services,” Huang said as he wrapped up. “These platforms propel new breakthroughs in AI, new applications of AI, and the next wave of AI for science and industry.”