Learn About Our Meetup

4500+ Members

Building an AR/AI vehicle manual using Amazon Sumerian and Amazon Lex

Auto manufacturers are continuously adding new controls, interfaces, and intelligence into their vehicles. They publish manuals detailing how to use these functions, but these handbooks are cumbersome. Because they consist of hundreds of pages in several languages, it can be difficult to search for relevant information about specific features. Attempts to replace paper-based manuals with video or mobile apps have not improved the experience. As a result, not all owners know about and take advantage of all the innovations offered by the auto manufacturers.

This post describes how you can use Amazon Sumerian and other AWS services to create an interactive auto manual. This solution uses augmented reality, an AI chatbot, and connected car data provided through AWS IoT. This is not a comprehensive step-by-step tutorial, but it does provide an overview of the logical components.

AWS services

This blog post uses the following six services:

  1. Amazon Sumerian lets you create and run virtual reality (VR), augmented reality (AR), and 3D applications quickly and easily without requiring any specialized programming or 3D graphics expertise. Created 3D scenes can be published with one click and then distributed on the web, in VR headsets and in mobile applications. In this post, Sumerian is used to render a 3D model of both interior and the exterior (optional) of the vehicle and animate it.
  2. Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex is powered by the same technology that powers Amazon Alexa. Amazon Lex democratizes deep learning technologies by putting the power of Alexa within reach of all developers. In this post, Amazon Lex is used to recognize voice commands and determine the function or feature being enquired by the owner.
  3. Amazon Polly is a text-to-speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. Amazon Polly allows you to create applications that talk and build entirely new categories of speech-enabled products. Amazon Polly supports dozens of voices, across a variety of languages, to enable applications working in different countries. In this post, Amazon Polly is used to vocalize Amazon Lex answers into lifelike speech.
  4. Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. DynamoDB is fully managed, has built-in security, backup and restore, and in-memory caching for internet-scale applications. In this post, you see the use of DynamoDB as a document store of steps for interacting within the interior of the vehicle.
  5. AWS Lambda lets you run code without provisioning or managing servers. In this demo, a Lambda function is used to populate an AWS IoT Core shadow document to contain the required
  6. AWS IoT Core is a managed cloud service that lets connected devices easily and securely interact with cloud applications and other devices. AWS IoT Core enables billions of devices and trillions of messages connect reliably and securely to AWS endpoints and to other devices. AWS IoT Core supports the concept of device shadows that store the latest state of connected devices whether these are online or not. In this post, a device shadow document is used to exchange information between Amazon Lex, DynamoDB, Sumerian, and a virtual representation of the car.

The following diagram illustrates the architectural relationships between these services.

The diagram shows AWS services in relation to each other and in relation to the end user and the vehicle. The owner’s journey starts with the mobile application that embeds the Sumerian scene containing the model of the car. The user can then tap the button to activate Amazon Lex and Amazon Polly. Once activated, the user can interact with the application to execute a series of steps to perform.

The content of the manual is stored in DynamoDB. Amazon Lex pulls this information by placing a Lambda call. The Lambda function queries the DynamoDB table and retrieves a JSON structure describing:

  1. the steps, ordered by a time and marked with start and end, to signal when the control should eventually be highlighted. For example,  …{“LeftTemperatureDial”: {“start”: 0, “end”: 2 }}…
  2. the prompt that needs to be announced while steps are shown in the Sumerian model. For example, “Press down left temperature dial for 2 seconds.”

This JSON document is then passed onto AWS IoT Core device shadow document. Sumerian then periodically polls for state change of the document and makes Sumerian model reflect the steps by highlighting interface controls accordingly.

For a better visual and aural representation, see the AWS Auto Demo video.

How to build this demo

Follow these steps and build the demo:

  1. Create a basic scene.
  2. Label the control elements.
  3. Create the DynamoDB table.
  4. Create the Amazon Lex bot.
  5. Use the Lambda function.
  6. Create a state machine in Sumerian.
  7. Position the AR camera in the scene.
  8. Publish the scene.
  9. Link to the Amazon Lex bot.
  10. Deploy the application.

Step 1: Create a basic scene

Create a basic scene, with entities and AWS configuration.

  1. Using the Augmented Reality template, create a scene and import the 3D asset of the commercially available car. This model is sourced from the 3D model marketplace but can be imported from free 3D galleries or from 3D design software in any of the supported formats.
  2. Create an Amazon Cognito identity pool, allowing Sumerian to use both Amazon Lex and AWS IoT Core. This identity pool should have the appropriate policies to access AWS IoT, Amazon Lex, and Amazon Polly. For more information, see Amazon Cognito Setup Using AWS CloudFormation.
  3. Provide the created identity pool ID to the AWS Configuration component in the Sumerian scene and enable the check box on the AWS IoT Data Client.

Step 2: Label the control elements

Create 3D labels or entities covering most of the control elements (dial, button, flap, display, sign, etc.) that are present in the interior. I colored these markers red and made them semitransparent, so that they still allow the view of the actual control underneath. I named these entities to more easily identify them in my scripts. I also hid them, to mimic the initial state, where only the actual interior is visible, as seen in the following screenshot.

Step 3: Create the DynamoDB table

Create a table in DynamoDB and populate it with several vehicle functions and appropriate steps for enabling, disabling, setting, or unsetting that function. These instructions contain start/end times and durations for each child model entity that must appear, honoring the order in which you want to show them, as shown in the following screenshot.

Step 4: Create the Amazon Lex bot

Create the Amazon Lex bot and populate it with intents and utterances. You are enabling Amazon Lex to understand owners’ questions. Amazon Lex determines which function the owner is asking about and sends this information into the Lambda function.

As seen in the two screenshots above, you are creating an intent called airconditioningManual. This intent then contains several sample utterances containing three custom slots:

  • {option} to describe the activity needed to perform, examples include “turn on”, “increase”, “remove” and others
  • {action} to describe the function, such as “temperature”, “fan speed” and others
  • {conjunction} to allow for optional conjunctions, like “with”, “on”, “of”, etc.

You can add more intents for other interactions or other parts of the vehicle.

Step 5: Use the Lambda function

The Lambda function contains code that performs the following steps.

  1. It queries the DynamoDB table to obtain a document of ordered instructions including start times, end times, and durations of the control elements (dial, button, flap, display, sign, etc.) being visible or highlighted.
    response = dynamo_client.get_item(
                                'action_name': {
                                    'S': toget

  2. It converts and stores this set of instructions into AWS IoT Core, via a device shadow document.
     action = iot_client.update_thing_shadow(
                                "desired": {
                                    "steps": actionList

  3. It returns a response object to Amazon Lex, fulfilling the request from the owner of the manual. This response object contains instructions to be performed, wrapped in the sentence, which is played back.
    rtrn = {
            "dialogAction": {
                "type": "Close",
                "fulfillmentState": "Fulfilled",
                "message": {
                    "contentType": "PlainText",
                    "content": rtrnmessage

Step 6: Create a state machine in Sumerian

Create a state machine in Sumerian using these steps.

  1. This state machine is continuously listening to changes that happen on device shadow document. There are three states in the state machine, as shown in the following diagram:
    1. loadSDK, which loads the AWS SDK
    2. getShadow (see the following step)
    3. A waiting state that calls the getShadow state in a looping routine.

    To learn more about state machines in Sumerian, see State Machine Basics. These changes are executed on the model, according to instructions provided by the IoT shadow, showing marking elements according to start/end time and the duration specified. The device shadow then gets reset.

  2. The getShadow state in the state machine in the preceding step is executing the script to retrieve the IoT device shadow, performing the actual animation of individual layers. To learn more about scripting and retrieving IoT device shadows, see IoT Thing, Shadow, and Script Actions. The example snippets of the script-performing steps (showing the highlight entity→waiting→hiding the highlight entity) follow:
    function showControl(control, ctx, controlName) {
            var myWorld =
            var controlEnt = myWorld.getEntityByName(controlName)
            }, (control.end-control.start)*1000);
        }, control.start*1000);

Step 7: Position the AR camera in the scene

Position the AR camera entity into the scene facing the dashboard of the vehicle. I also scale the car accordingly, so the user of the mobile application and vehicle owner can see the relative size of control elements (dial, button, flap, display, sign, etc.) compared to the reality of the physical vehicle.

Step 8: Publish the scene

Publish the scene and embed the URL into an example iOS/Android placeholder application available on GitHub. These applications are open source and available for both iOS and Android.

private let sceneURL = URL(string: "")!

Step 9: Link to the Amazon Lex bot

Last but not the least, I add an Amazon Lex button from another example project on GitHub and link it with the published Amazon Lex bot from Step 4.

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool {
        let credentialProvider = AWSCognitoCredentialsProvider(regionType: AWSRegionType.USEast1, identityPoolId: "us-east-1:STUVWXYZ-0000-1111-2222-LKJIHGFEDCBA")
        let configuration = AWSServiceConfiguration(region: AWSRegionType.USEast1, credentialsProvider: credentialProvider)
        AWSServiceManager.default().defaultServiceConfiguration = configuration
        let chatConfig = AWSLexInteractionKitConfig.defaultInteractionKitConfig(withBotName: "XXXAWSYYY", botAlias: "$LATEST")
        chatConfig.autoPlayback = true
        AWSLexInteractionKit.register(with: configuration!, interactionKitConfiguration: chatConfig, forKey: "AWSLexVoiceButton")
        AWSLexInteractionKit.register(with: configuration!, interactionKitConfiguration: chatConfig, forKey: "chatConfig")
        return true

Step 10: Deploy the application

The final step is to deploy the application onto the iOS-enabled device and test the functionality. The demo video can be seen in the AWS services section of this post.


This is not meant to be a comprehensive guide to every single component plugged in to the manual, but it describes all logical components. Based on this post, you should feel confident enabling and deploying 3D models of any assets that need an interactive manual with both visual and aural feedback into the cloud.

Your solution can use Sumerian and other AI, compute, or storage services. You now understand how these services integrate, what role they play in the experience and how they can be extended beyond the scope of this use case.

Start by reviewing the steps above, subscribe to the Amazon Sumerian video channel, read more about integrations with Amazon Lex and Amazon Polly and IoT Shadow, and get building!

About the Author

Miro Masat is a Solutions Architect at Amazon Web Services, based out of London, UK. He is focusing on Engineering accounts, mainly in the automotive industry. Miro is a massive fan of Virtual, Augmented and Mixed reality and always seeks ways to bring engineering to VR/AR/MR and vice versa. Outside of work, he enjoys traveling, learning languages and building DIY projects.




[P] Reinforcement Learning / Game Theory on Urban Planning Problems


I’ve been working on the use of machine learning models for urban planning problems for my PhD. My earlier work focused on the use of regression-based models (ANN, GPR, etc.) but due to changes in funding, I’m having to switch to reinforcement learning / game theoretic models for my current work. However, I haven’t been able to find collaborators from the RL domain in my university, and my advisor is not an expert in it either.

Our project currently involves path planning or resource allocation in stochastic environments (eg: snow plowing, police placement [not predictive policing], trash pickup, etc.). If there is anyone in this sub-reddit who has experience in these domains or RL in general, and if you’re interested to collaborate, please reach out.

I could try and do a lot of literature surveys to make sure I’m not trying to reinvent a wheel or going in the wrong direction, but I strongly believe that subject experts would be able to provide much better insights.

submitted by /u/AdmiralLunatic
[link] [comments]

How American Express Uses Deep Learning for Better Decision Making

Financial fraud is on the rise. As the number of global transactions increase and digital technology advances, the complexity and frequency of fraudulent schemes are keeping pace.

Security company McAfee estimated in a 2018 report that cybercrime annually costs the global economy some $600 billion, or 0.8 percent of global gross domestic product.

One of the most prevalent — and preventable — types of cybercrime is credit card fraud, which is exacerbated by the growth in online transactions.

That’s why American Express, a global financial services company, is developing deep learning generative and sequential models to prevent fraudulent transactions.

“The most strategically important use case for us is transactional fraud detection,” said Dmitry Efimov, vice president of machine learning research at American Express. “Developing techniques that more accurately identify and decline fraudulent purchase attempts helps us protect our customers and our merchants.”

Cashing into Big Data

The company’s effort spanned several teams that conducted research on using generative adversarial networks, or GANs, to create synthetic data based on sparsely populated segments.

In most financial fraud use cases, machine learning systems are built on historical transactional data. The systems use deep learning models to scan incoming payments in real time, identify patterns associated with fraudulent transactions and then flag anomalies.

In some instances, like new product launches, GANs can produce additional data to help train and develop more accurate deep learning models.

Given its global integrated network with tens of millions of customers and merchants, American Express deals with massive volumes of structured and unstructured data sets.

Using several hundred data features, including the time stamps for transactional data, the American Express teams found that sequential deep learning techniques, such as long short-term memory and temporal convolutional networks, can be adapted for transaction data to produce superior results compared to classical machine learning approaches.

The results have paid dividends.

“These techniques have a substantial impact on the customer experience, allowing American Express to improve speed of detection and prevent losses by automating the decision-making process,” Efimov said.

Closing the Deal with NVIDIA GPUs 

Due to the huge amount of customer and merchant data American Express works with, they selected NVIDIA DGX-1 systems, which contain eight NVIDIA V100 Tensor Core GPUs, to build models with both TensorFlow and PyTorch software.

Its NVIDIA GPU-powered machine learning techniques are also used to forecast customer default rates and to assign credit limits.

“For our production environment, speed is extremely important with decisions made in a matter of milliseconds, so the best solution to use are NVIDIA GPUs,” said Efimov.

As the systems go into production in the next year, the teams plan on using the NVIDIA TensorRT platform for high-performance deep learning inference to deploy the models in real time, which will help improve American Express’ fraud and credit loss rates.

Efimov will be presenting his team’s work at the GPU Technology Conference in San Jose in March. To learn more about credit risk management use cases from American Express, register for GTC, the premier AI conference for insights, training and direct access to experts on the key topics in computing across industries.

The post How American Express Uses Deep Learning for Better Decision Making appeared first on The Official NVIDIA Blog.

Machine Learning Infrastructure [Research]

I just started a new position at a small AI startup as a systems engineer. Historically, my roles have been in more traditional IT roles on the support side in Windows environments.

We have several data science and machine learning teams for different products and projects. They all seem to use different technologies at the moment. We also have a lot of bare metal hardware laying around that is not inventoried or monitored and seems to be under-utilized in some places while other hardware has a long waitlist.

I had a meeting with the managers and leads of each team to figure out what they were doing, using, etc. All of them have decided to transition to Airflow and Dask. Some teams require heavy CPU and storage while others require heavy GPU for their jobs.

This is my first venture into machine learning so I’m trying to educate myself. We have been discussing gathering up unused hardware and building one or more clusters to provide organized, consistent, and scheduled resources to the teams for their workflows. I am thinking something like containers as a service where they can pick their CPU/GPU requirements and generate instances for processing on-demand, without having to go through Ops. Ops just maintains the infrastructure to make sure there is enough available to the teams.

For those of you working in machine learning and data science, does this sound like a good solution? Are there products out there y’all use that function in this way? I’ve been reading about some of VMware’s vCloud solutions and found an article about containers/Kubernetes as a service that also allowed for traditional VMs to reside in the cluster but now I can’t find it.

I would appreciate any info, suggestions, articles, or products that may help me empower our teams. I would love to really provide some solid infrastructure that is productive and easy for them to use.


submitted by /u/gennyact
[link] [comments]

[D] Can GANs generate new animals?

I googled but couldn’t find anything.

We’ve seen GANs trained on imagenet that were conditioned on the labels, so they can generate dogs or ants for example. But what if you just conditioned it on animal/not animal?

Could you get a GAN that can think up new animal species that we’ve never seen?

Or you could even play around with the specificity, so you can train it on reptiles for example, instead of specifically snakes or turtles.

submitted by /u/Kavillab
[link] [comments]

[R] Depth Maps Inpainting with GANs

[R] Depth Maps Inpainting with GANs



We proposed a GAN network earlier to deal with disparity data inpainting and object removal in a paper published in the Intelligent Vehicles Symposium 2019 (IV’19) ( Now we published on arXiv a more in-depth analysis of our latest network version, which is also open-sourced.

Third column the objects removed using the Contextual Attention network; Last column our results removing the same objects.

submitted by /u/matiaslucas
[link] [comments]

[Research] Question Regarding “Deep Convolutional Spiking Neural Networks for Image Classification”

Paper can be found here:

I am currently investigating the research into Spiking Neural Networks using Spike-timing-dependent plasticity learning, initially regarding image processing.

I have now read several papers that discuss “Spiking Convolutional Neural Networks”, and cannot understand how these particular networks can be convolutional in nature, at least in the same way as backprop trained CNNs are.

The kernels in the standard CNN are trained by backprop against every possible patch or feature in the preceding layer. So a kernel can detect a feature at any point in the preceding layer.

While you can definitely use “kernels” with these spiking networks, they would only apply to a set patch of the image right? If you wanted something that ran over all the patches your “kernel neuron” would just end up with a link from each neuron in the previous layer and it would be a mess. Or you would need to duplicate this kernel neuron x number of times depending on the size of the input, and find some way to keep these neurons with the same input weights.

What am I misunderstanding here? Do you simply end up with a whole pile of duplicate kernels across all the patches? This would definitely work but is it optimal?

submitted by /u/bitcoin_analysis_app
[link] [comments]

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.