Author: torontoai
Manager, Software Development – Chisel AI – Toronto, ON
From Chisel AI – Tue, 26 Nov 2019 23:37:48 GMT – View all Toronto, ON jobs
Save on inference costs by using Amazon SageMaker multi-model endpoints
Businesses are increasingly developing per-user machine learning (ML) models instead of cohort or segment-based models. They train anywhere from hundreds to hundreds of thousands of custom models based on individual user data. For example, a music streaming service trains custom models based on each listener’s music history to personalize music recommendations. A taxi service trains custom models based on each city’s traffic patterns to predict rider wait times.
While the benefit of building custom ML models for each use case is higher inference accuracy, the downside is that the cost of deploying models increases significantly, and it becomes difficult to manage so many models in production. These challenges become more pronounced when you don’t access all models at the same time but still need them to be available at all times. Amazon SageMaker multi-model endpoints addresses these pain points and gives businesses a scalable yet cost-effective solution to deploy multiple ML models.
Amazon SageMaker is a modular, end-to-end service that makes it easier to build, train, and deploy ML models at scale. After you train an ML model, you can deploy it on Amazon SageMaker endpoints that are fully managed and can serve inferences in real time with low latency. You can now deploy multiple models on a single endpoint and serve them using a single serving container using multi-model endpoints. This makes it easy to manage ML deployments at scale and lowers your model deployment costs through increased usage of the endpoint and its underlying compute instances.
This post introduces Amazon SageMaker multi-model endpoints and shows how to apply this new capability to predict housing prices for individual market segments using XGBoost. The post demonstrates running 10 models on a multi-model endpoint versus using 10 separate endpoints. This results in savings of $3,000 per month, as shown in the following figure:

Multi-model endpoints can easily scale to hundreds or thousands of models. The post also discusses considerations for endpoint configuration and monitoring, and highlights cost savings of over 90% for a 1,000-model example.
Overview of Amazon SageMaker multi-model endpoints
Amazon SageMaker enables you to one-click deploy your model onto autoscaling Amazon ML instances across multiple Availability Zones for high redundancy. Specify the type of instance and the maximum and minimum number desired, and Amazon SageMaker takes care of the rest. It launches the instances, deploys your model, and sets up a secure HTTPS endpoint. Your application needs to include an API call to this endpoint to achieve low latency and high throughput inference. This architecture allows you to integrate new models into your application in minutes because model changes no longer require application code changes. Amazon SageMaker is fully managed and manages your production compute infrastructure on your behalf to perform health checks, apply security patches, and conduct other routine maintenance, all with built-in Amazon CloudWatch monitoring and logging.
Amazon SageMaker multi-model endpoints enable you to deploy multiple trained models to an endpoint and serve them using a single serving container. Multi-model endpoints are fully managed and highly available to serve traffic in real time. You can easily access a specific model by specifying the target model name when you invoke the endpoint. This feature is ideal when you have a large number of similar models that you can serve through a shared serving container and don’t need to access all the models at the same time. For example, a legal application may need complete coverage of a broad set of regulatory jurisdictions, but that may include a long tail of models that are rarely used. One multi-model endpoint can serve these lightly used models to optimize cost and manage the large number of models efficiently.
To create a multi-model endpoint in Amazon SageMaker, choose the multi-model option, provide the inference serving container image path, and provide the Amazon S3 prefix in which the trained model artifacts are stored. You can define any hierarchy of your models in S3. When you invoke the multi-model endpoint, you provide the relative path of a specific model in the new TargetModel header. To add models to the multi-model endpoint, add a new trained model artifact to S3 and invoke it. To update a model already in use, add the model to S3 with a new name and begin invoking the endpoint with the new model name. To stop using a model deployed on a multi-model endpoint, stop invoking the model and delete it from S3.
Instead of downloading all the models into the container from S3 when the endpoint is created, Amazon SageMaker multi-model endpoints dynamically load models from S3 when invoked. As a result, an initial invocation to a model might see higher inference latency than the subsequent inferences, which are completed with low latency. If the model is already loaded on the container and invoked, then you skip the download step and the model returns the inferences in low latency. For example, assume you have a model that is only used a few times a day. It is automatically loaded on demand, while frequently accessed models are retained in memory with the lowest latency. The following diagram shows models dynamically loaded from S3 into a multi-model endpoint.

Using Amazon SageMaker multi-model endpoints to predict housing prices
This post takes you through an example use case of multi-model endpoints, based on the domain of house pricing. For more information, see the fully working notebook on GitHub. It uses generated synthetic data to let you experiment with an arbitrary number of models. Each city has a model trained on a number of houses with randomly generated characteristics.
The walkthrough includes the following steps:
- Making your trained models available for a multi-model endpoint
- Preparing your container
- Creating and deploying a multi-model endpoint
- Invoking a multi-model endpoint
- Dynamically loading a new model
Making your trained models available for a multi-model endpoint
You can take advantage of multi-model deployment without any changes to your models or model training process and continue to produce model artifacts (for example, model.tar.gz files) that get saved in S3.
In the example notebook, a set of models is trained in parallel, and the model artifacts from each training job are copied to a specific location in S3. After training and copying a set of models, the folder has the following contents:
Each file is renamed from its original model.tar.gz name so that each model has a unique name. You refer to the target model by name when invoking a request for a prediction.
Preparing your container
To use Amazon SageMaker multi-model endpoints, you can build a docker container using the general-purpose multi-model server capability on GitHub. It is a flexible and easy-to-use framework for serving ML models with any framework. The XGBoost sample notebook demonstrates how to build a container using the open-source Amazon SageMaker XGBoost container as a base.
Creating a multi-model endpoint
The next step is to create a multi-model endpoint that knows where in S3 to find target models. This post uses boto3, the AWS SDK for Python, to create the model metadata. Instead of describing a specific model, set its mode to MultiModel and tell Amazon SageMaker the location of the S3 folder containing all the model artifacts.
Additionally, indicate the framework image that models use for inference. This post uses an XGBoost container that was built in the previous step. You can host models built with the same framework in a multi-model endpoint configured for that framework. See the following code for creating the model entity:
With the model definition in place, you need an endpoint configuration that refers back to the name of the model entity you created. See the following code:
Lastly, create the endpoint itself with the following code:
Invoking a multi-model endpoint
To invoke a multi-model endpoint, you only need to pass one new parameter, which indicates the target model to invoke. The following example code is a prediction request using boto3:
The sample notebook iterates through a set of random invocations against multiple target models hosted behind a single endpoint. This shows how the endpoint dynamically loads target models as needed. See the following output:
The time to complete the first request against a given model experiences additional latency (called a cold start) to download the model from S3 and load it into memory. Subsequent calls finish with no additional overhead because the model is already loaded.
Dynamically adding a new model to an existing endpoint
It’s easy to deploy a new model to an existing multi-model endpoint. With the endpoint already running, copy a new set of model artifacts to the same S3 location you set up earlier. Client applications are then free to request predictions from that target model, and Amazon SageMaker handles the rest. The following example code makes a new model for New York that is ready to use immediately:
With multi-model endpoints, you don’t need to bring down the endpoint when adding a new model, and you avoid the cost of a separate endpoint for this new model. An S3 copy gives you access.
Scaling multi-model endpoints for large numbers of models
The benefits of Amazon SageMaker multi-model endpoints increase based on the scale of model consolidation. You can see cost savings when hosting two models with one endpoint, and for use cases with hundreds or thousands of models, the savings are much greater.
For example, consider 1,000 small XGBoost models. Each of the models on its own could be served by an ml.c5.large endpoint (4 GiB memory), costing $0.119 per instance hour in us-east-1. To provide all one thousand models using their own endpoint would cost $171,360 per month. With an Amazon SageMaker multi-model endpoint, a single endpoint using ml.r5.2xlarge instances (64 GiB memory) can host all 1,000 models. This reduces production inference costs by 99% to only $1,017 per month. The following table summarizes the differences between single and multi-model endpoints. Note that 90th percentile latency remains at 7 milliseconds in this example.
| Single model endpoint |
Multi-model endpoint |
|
| Total endpoint price per month | $171,360 | $1,017 |
| Endpoint instance type | ml.c5.large | ml.r5.2xlarge |
| Memory capacity (GiB) | 4 | 64 |
| Endpoint price per hour | $0.119 | $0.706 |
| Number of instances per endpoint | 2 | 2 |
| Endpoints needed for 1,000 models | 1,000 | 1 |
| Endpoint p90 latency (ms) | 7 | 7 |
Monitoring multi-model endpoints using Amazon CloudWatch metrics
To make price and performance tradeoffs, you will want to test multi-model endpoints. Amazon SageMaker provides additional metrics in CloudWatch for multi-model endpoints so you can determine the endpoint usage and the cache hit rate and optimize your endpoint. The metrics are as follows:
- ModelLoadingWaitTime – The interval of time that an invocation request waits for the target model to be downloaded or loaded to perform the inference.
- ModelUnloadingTime – The interval of time that it takes to unload the model through the container’s
UnloadModelAPI call. - ModelDownloadingTime – The interval of time that it takes to download the model from S3.
- ModelLoadingTime – The interval of time that it takes to load the model through the container’s
LoadModelAPI call. - ModelCacheHit – The number of
InvokeEndpointrequests sent to the endpoint where the model was already loaded. Taking the Average statistic shows the ratio of requests in which the model was already loaded. - LoadedModelCount – The number of models loaded in the containers in the endpoint. This metric is emitted per instance. The
Averagestatistic with a period of 1 minute tells you the average number of models loaded per instance, and theSumstatistic tells you the total number of models loaded across all instances in the endpoint. The models that this metric tracks are not necessarily unique because you can load a model in multiple containers in the endpoint.
You can use CloudWatch charts to help make ongoing decisions on the optimal choice of instance type, instance count, and number of models that a given endpoint should host. For example, the following chart shows the increasing number of models loaded and a corresponding increase to the cache hit rate.

In this case, the cache hit rate started at 0 when no models had been loaded. As the number of models loaded increases, the cache hit rate eventually hits 100%.
Matching your endpoint configuration to your use case
Choosing the right endpoint configuration for an Amazon SageMaker endpoint, particularly the instance type and number of instances, depends heavily on the requirements of your specific use case. This is also true for multi-model endpoints. The number of models that you can hold in memory depends on the configuration of your endpoint (such as instance type and count), the profile of your models (such as model size and model latency), and the traffic patterns. You should configure your multi-model endpoint and right-size your instances by considering all these factors and also set up automatic scaling for your endpoint.
Amazon SageMaker multi-model endpoints fully support automatic scaling. The invocation rates used to trigger an autoscale event are based on the aggregate set of predictions across the full set of models an endpoint serves.
In some cases, you may opt to reduce costs by choosing an instance type that cannot hold all the targeted models in memory at the same time. Amazon SageMaker unloads models dynamically when it runs out of memory to make room for a newly-targeted model. For infrequently requested models, the dynamic load latency can still meet the requirements. In cases with more stringent latency needs, you may opt for larger instance types or more instances. Investing time up front to do proper performance testing and analysis pays excellent dividends in successful production deployments.
Conclusion
Amazon SageMaker multi-model endpoints help you deliver high-performance machine learning solutions at the lowest possible cost. You can significantly lower your inference costs by bundling sets of similar models behind a single endpoint that you can serve using a single shared serving container. Similarly, Amazon SageMaker gives you managed spot training to help with training costs, and integrated support for Amazon Elastic Inference for deep learning workloads. You can boost the bottom-line impact of your ML teams by adding these to the significant productivity improvements Amazon SageMaker delivers.
Give multi-model endpoints a try, and share your feedback and questions in the comments.
About the authors
Mark Roy is a Machine Learning Specialist Solution Architect, helping customers on their journey to well-architected machine learning solutions at scale. In his spare time, Mark loves to play, coach, and follow basketball.
Urvashi Chowdhary is a Principal Product Manager for Amazon SageMaker. She is passionate about working with customers and making machine learning more accessible. In her spare time, she loves sailing, paddle boarding, and kayaking.
[P] Handwritten Text Recognition using Convolution Sequence to Sequence
Instead of using RNN with Seq-to-Seq modeling, CNN with Seq-to-Seq has been used which reduces the training and inference time. The work is novel when implemented around Dec 2018. The training and testing pipeline has been created for IAM handwitten dataset. Please provide some feedback on this project and its continuity since I would like to make further advancement and formally document the work into a article.
submitted by /u/spacevstab
[link] [comments]
[D] How can i elaborate texture and statistic features in CNN?
I have a dataset 2200×34 where 1-33 column are features (texture and statistic) and 34th column is the class (0 or 1). I know my dataset is quite poor, but I splitted in 80% training set and 20% validation test.
I’d like to use CNN for classification using these features, my steps are:
– Splitting in training set and validation test;
– Mean normalisation of features;
– Reshaping training set and validation set in order to have 1760x34x1 and 440x34x1 as dimensions;
– Create my model:
opt = SGD(lr=0.0001) model = Sequential() model.add(Conv1D(16, 3, activation="relu", input_shape =(34,1))) model.add(BatchNormalization()) model.add(MaxPooling1D(2)) model.add(Conv1D(32, 3, activation="relu")) model.add(MaxPooling1D(2)) model.add(Flatten()) model.add(Dense(512, activation="relu")) model.add(Dropout(0.5)) model.add(Dense(1, activation="sigmoid")) model.summary() # compile the model model.compile(loss='binary_crossentropy', optimizer= opt, metrics=['accuracy'])
Sadly my model has bad performance (acc = 55% more or less and loss = 0.69). Do you have any suggestion to increase my performance? Is there something wrong?
Here the model.summary()
Layer (type) Output Shape Param # ================================================================= conv1d_3 (Conv1D) (None, 32, 16) 64 _________________________________________________________________ batch_normalization_1 (Batch (None, 32, 16) 64 _________________________________________________________________ max_pooling1d_3 (MaxPooling1 (None, 16, 16) 0 _________________________________________________________________ conv1d_4 (Conv1D) (None, 14, 32) 1568 _________________________________________________________________ max_pooling1d_4 (MaxPooling1 (None, 7, 32) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 224) 0 _________________________________________________________________ dense_2 (Dense) (None, 512) 115200 _________________________________________________________________ dropout_1 (Dropout) (None, 512) 0 _________________________________________________________________ dense_3 (Dense) (None, 1) 513 =================================================================
submitted by /u/Samatarou
[link] [comments]
Automating financial decision making with deep reinforcement learning
Machine learning (ML) is routinely used in every sector to make predictions. But beyond simple predictions, making decisions is more complicated because non-optimal short-term decisions are sometimes preferred or even necessary to enable long-term, strategic goals. Optimizing policies to make sequential decisions toward a long-term objective can be learned using a family of ML models called Reinforcement Learning (RL).
Amazon SageMaker is a modular, fully managed service with which developers and data scientists can build, train, and deploy ML models at any scale. In addition to building supervised and unsupervised ML models, you can also build RL models in Amazon SageMaker. Amazon SageMaker RL builds on top of Amazon SageMaker, adding pre-packaged RL toolkits and making it easy to integrate custom simulation environments. Training and prediction infrastructure is fully managed so you can focus on your RL problem and not on managing servers.
In this post I will show you how to train an agent with RL that can make smart decisions at each time step in a simple bidding environment. The RL agent will choose whether to buy or sell an asset at a given price to achieve maximum long-term profit. I will first introduce the mathematical concepts behind deep reinforcement learning and describe a simple custom implementation of a trading agent in Amazon SageMaker RL. I will also present some benchmark results between two different types of implementation. The first, most straightforward approach consists of an agent looking back at a 10-day window to predict the best decision to make between buying, selling, or doing nothing. In the second approach, I design a time series forecasting deep learning recurrent neural network (RNN) and the agent acts based on forecasts produced by the RNN. The RNN encoder plays the role of an advisor as the agent makes decisions and learns optimal policies to maximize long-term profit.

The data for this post is an arbitrary bidding system made of financial time series in dollars that represent the prices of an arbitrary asset. You can think of this data as the price of an EC2 Spot Instance or the market value of a publicly traded stock.
Introduction to deep reinforcement learning
Predicting events is straightforward, but making decisions is more complicated. The framework of reinforcement learning defines a system that learns to act and make decisions to reach a specified long-term objective. This section describes the key motivations, concepts, and equations behind deep reinforcement learning. If you are familiar with RL concepts, you can skip this section.
Notations and terminology
Supervised learning relies on labels of target outcomes for training models to predict these outcomes on unseen data. Unsupervised learning does not need labels; it learns inherent patterns on training data to group unseen data according to these learned patterns. But there are cases in which labeled data is only partially available, cases in which you can only characterize final targeted outcomes and you’d like to learn how to reach these outcomes, or even cases in which tradeoffs and compromises need be learned given an overall objective (for example, how to balance work and personal life every day to remain happy in the long term). Those are cases where RL can help. Motivations behind RL are indeed cases in which you cannot define a complete supervision and can only define feedback signals based on actions taken, and cases in which you’d like to learn the optimal decisions to make as time progresses.
To define an RL framework, you need to define a goal for a learning system (the agent) that makes decisions in an environment. Then you need to translate this goal into a mathematical formula called a reward function, aimed at rewarding or penalizing the agent when it takes an action and acting as a feedback loop to help the agent reach the predefined goal. There are three key RL components: state, action, and reward; at each time step, the agent receives a representation of the environment’s state st, takes an action at based on st, and receives a numerical reward rt+1based on st+1 = (st, at).
The following diagram presents key components of Reinforcement Learning.

Defining a reward function can be easy or, in contrast, highly empirical. For example, in the AWS DeepRacer League, participants are asked to invent their own RL reward function to compete in virtual and in-person races worldwide, because as of this writing, no one knows how to define the best policy for a car to win a race. In contrast, for a trading agent, the reward function can be simply defined as the overall profit accumulated, because this is always a key performance that traders are driving toward.
In most cases, situations in which an agent is confronted may change over time depending on which actions it takes, and thus the RL problem requires to map situations to actions that are best in these particular situations. The general RL problem consists of learning entire sequences of actions, called policies, and noted 𝜋(a|s), to maximize reward.
Before looking at methods to find optimal policies, you can formalize the objective of learning when the goal is to maximize the cumulative reward after a time step t by defining the concept of cumulative return Gt:
![]()
When there is a notion of a final step, each subsequence of actions is called an episode, and the MDP is said to have a finite horizon. If the agent-environment interaction goes on forever, the MDP is said to have an infinite horizon; in which case you can introduce a concept of discounting to define a mathematically unified notation for cumulative return:

If γ = 1 and T is finite, this notation is the finite horizon formula for Gt, while if γ < 1, episodes naturally emerge even if T is infinite because terms with a large value of k become exponentially insignificant compared to terms with a smaller value of k. Thus the sum converges for infinite horizon MDP, and this notation applies for both finite and infinite horizon MDP.
Optimizing sequential decision-making with RL
There exists a large number of optimization methods that have been developed to solve the general RL problem. They are generally grouped into either policy-based methods or value-based methods, or combination thereof. We’ll focus mostly on the value-based methods as explained below.
Policy-based RL
In a policy-based RL, the policy is defined as a parametric function of some parameters θ: 𝜋θ = 𝜋(a|s,θ) and estimated directly by standard gradient methods. This is done by defining a performance measure, typically the expected value of Gt for a given policy, and applying gradient ascent to find θ that maximizes this performance measure. 𝜋θ can be any parameterized functional form of 𝜋 as long as 𝜋 is differentiable with respect to θ; thus a neural network can be used to estimate it, in which case the parameters θ are the weights of the neural network and the neural network predicts entire policies based on input states.
Direct policy search is relatively simple and also very limited because it attempts to estimate an entire policy directly from a set of previously experienced policies. Combining policy-based RL with value-based RL to refine the estimate of Gt almost always improves accuracy and convergence of RL, so this post focuses exclusively on value-based RL.
Value-based RL
An RL process is a Markov Decision Process (MDP), which is a mathematical formalization of sequential decision-making for which you can make precise statements. An MDP consists in a sequence (s0, a0, r1, s1, a1, r2, …, sn), or more generally (st, at, rt+1, st+1)n, and the dynamics of an MDP as illustrated in the previous diagram is fully characterized by a transition probability T(s,a,s’) = p(s’|s,a) that defines the probability for any state s to going to any other state s’, and a reward function r(s,a) = 𝔼(r|s,a) that defines the expected reward for any given state s.
Given the goal of RL is to learn the best policies to maximize Gt, this post first describes how to evaluate a policy in an MDP, and then how to go about finding an optimal policy.
To evaluate a policy, define a measure for any state s called a state value function V(s) that estimates how good it is to be in s for a particular way of acting, that is, for a particular policy 𝜋:
![]()
Recall from previous section that Gt is defined by the cumulative sum of rewards starting from current time step t up to final time step T. Replacing Gt by the right side of the equation defined in the previous section and moving rt+1 out of the summation, you get:
![]()
Replacing the expectation formula by a sum over all probabilities yields:

The latter is called the Bellman equation; it enables you to compute V𝜋(s) for an arbitrary 𝜋 based on T and r, computing the new value of V at time step t based on the previous value of V at time step t+1, recursively. You can estimate T and r by counting all the occurrences of observed transitions and rewards in a set of observed interactions between the agent and the environment. T and r define the model of the MDP.
Now you know how to evaluate a policy, you can extend the notion of state value function to the notion of state-action value function Q(s,a) to define an optimal policy:


The latter is called the Bellman Optimality equation; it estimates the value of taking a particular action a in a particular state s assuming you’ll take the best actions thereafter. If you transform the Bellman Optimality equation into an assignment function you get an iterative algorithm that, assuming you know T and r, is guaranteed to converge to the optimal policy with random sampling of all possible actions in all states. Because you can apply this equation to states in any order, this iterative algorithm is referred to as asynchronous dynamic programming.
For systems in which you can easily compute or sample T and r, this approach is sufficient and is referred to as model-based RL because you know (previously computed) the transition dynamics (as characterized by T and r). But for most real-case problems, it is more convenient to produce stochastic estimates of the Q-values based on experience accumulated so far, so the RL agent learns in real time at every step and ultimately converges toward true estimates of the Q-values. To this end, combine asynchronous dynamic programming with the concept of moving average:
![]()
and replace Gt(s) by a sampled value as defined in the Bellman Optimality equation:

The latter is called temporal difference learning; it enables you to update the moving average Q*k+1(s,a) based on the difference between Q*k(s,a) and Q*k(s’,a’). This post shows a one-step temporal difference, as used in the most straightforward version of the Q-learning algorithm, but you could extend this equation to include more than one step (n-step temporal difference). Again, some theorems exist that prove Q-learning converges to the optimal policy 𝜋* assuming infinite random action selection.
RL action-selection strategy
You can alleviate the infinite random action selection condition by using a more efficient random action selection strategy such as ε-Greedy to increase sampling of states frequently encountered in good policies and decrease sampling of less valuable states. In ε-Greedy, the agent selects a random action with probability ε, and the rest of the time (that is, with probability 1 – ε), the agent selects the best action according to the latest Q-values, which is defined as
.
This strategy enables you to balance exploiting the latest Q-values with exploring new states and actions. Generally, ε is chosen to be large at the beginning to favor exploration of state-action space, and progressively reduced to a smaller value. For example, ε = 1% to exploit Q-values most of the time yet keep exploring and potentially discovering better policies.
Combining deep learning and reinforcement learning
In many cases, for example when playing chess or Go, the number of states, actions, and combinations thereof is so large that the memory and time needed to store the array of Q-values is enormous. There are more combinations of states and actions in the game Go than known stars in the universe. Instead of storing Q-values for all states and actions in an array which is impossible for Go and many other cases, deep RL attempts to generalize experience from a subset of states and actions to new states and actions. The following diagram is a comparison of Q-learning with look-up tables vs. function approximation.

Deep Q-learning uses a supervised learning approximation to Q(s,a) by using
as the label, because the Q-learning assignment function is equivalent to a gradient update of the general form x = x – ∝∇J (for any arbitrary x) where:

The latter is called the Square Bellman Loss; it allows you to estimate Q-values based on generalization from mapping states and actions to Q-values using a deep learning network, hence the name deep Q-learning.
Deep Q-learning is not guaranteed to converge anymore because the label used for supervised learning depends on current values of the network weights, which are themselves updated based on learning from the labels, hence a problem of broken ergodicity. But this approach is often good enough in practice and can generalize to the most complex RL problems (such as autonomous driving, robotics, or playing Go).
In deep RL, the training set changes at every step, so you need to define a buffer to enable batch training, and schedule regular refreshes as experience accumulates for the agent to learn in real time. This is called experience replay and can be compared to the process of learning while dreaming in biological systems: experience replay allows you to keep optimizing policies over a subset of experiences (st, at, rt+1, st+1)n kept in memory.
This closes our theoretical journey on reinforcement learning and in particular the popular Deep Q-learning method, which implied several flavors of statistical learning, namely asynchronous dynamic programming, moving average, ε-Greedy and deep learning. Several RL variants exist in addition to the standard Deep Q-learning method, and Amazon SageMaker RL provides most of them out of the box, so after understanding one you will be able to test most of them using Amazon SageMaker RL.
Implementing a custom RL model in Amazon SageMaker
Application of reinforcement learning in financial services has received attention because better financial decisions can lead to high dollar return. In this post I implement a financial trading bot to demonstrate how to develop an agent that looks at the price of an asset over the past few days and decide whether to buy, sell, or do nothing, given the goal to maximize long-term profit.
I will benchmark this approach with another approach in which I combine reinforcement learning with recurrent deep neural networks for time series forecasting. In this second approach, the agent uses insights from multi-step horizon forecasts to make trading decisions and learn policies based on forecasted sequences of events rather than past sequences of events.
The data is a randomized version of a publicly available dataset on Yahoo Finance and represents price series for an arbitrary asset. For example, you can think of this data as the price of an EC2 Spot Instance or the market value of a publicly traded stock.
Setting up a built-in preset for deep Q-learning
In Amazon SageMaker RL, a preset file configures the RL training jobs and defines the hyperparameters for the RL algorithms. The following preset, for example, implements and parameterizes a deep Q-learning agent, as described in the previous section. It sets the number of steps to 2M. For more information about Amazon SageMaker RL hyperparameters and training jobs configuration, see Reinforcement Learning with Amazon SageMaker RL.
Setting up a custom OpenAI Gym RL environment
In Amazon SageMaker RL, most of the components of an RL Markov Decision Process as described in the previous section are defined in an environment file. You can connect open-source and custom environments developed using OpenAI Gym, which is a popular set of interfaces to help define RL environments and is fully integrated into Amazon SageMaker. The RL environment can be the real world in which the RL agent interacts or a simulation of the real world. You can simulate the real world by a daily series of past prices. To formulate the bidding problem into an RL problem, you need to define each of the following components of the MDP:
- Environment – Custom environment that generates a simulated price with daily and weekly variations and occasional spikes
- State – Daily price of the asset over the past 10 days
- Action – Sell or buy the asset, or do nothing, on a daily basis
- Goal – Maximize accumulated profit
- Reward – Positive reward equivalent to daily profit, if any, and a penalty equivalent to the price paid for any asset purchased
A custom file called TradingEnv.py specifies all these components. It is located in the src/ folder and called by the preset file defined earlier. In particular, the size of the state space (n = 10 days) and the action space (m = 3) are defined using OpenAI Gym through the following lines of code:
The goal is translated into a quantitative reward function, as explained in the previous section, and this reward function is also formulated in the environment file. See the following code:
Launching an Amazon SageMaker RL estimator
After defining a preset and RL environment in Amazon SageMaker RL, you can train the agent by creating an estimator following a similar approach to when using any other ML estimators in Amazon SageMaker. The following code shows how to define an estimator that calls the preset (which itself calls the custom environment):
You would generally use a config.py file in Amazon SageMaker RL to define global variables, such as the location of a custom data file in the Amazon SageMaker container. See the following code:
Training and evaluating the RL agent’s performance
This section demonstrates how to train the RL bidding agent in a simulation environment constructed using seven years of daily historical transactions. You can evaluate learning during the training phase and evaluate the performance of the trained RL agent at generalizing on a test simulation, here constructed using three additional years of daily transactions. You can assess the agent’s ability to learn and generalize by visualizing the accumulated reward and accumulated profit per episode during the training and test phases.
Evaluating learning during training of the RL agent
In reinforcement learning, you can train an agent by simulating interactions (st, at, rt+1, st+1) with the environment and computing the cumulative return Gt, as described previously. After an initial exploration phase, as the value of ε in ε-Greedy progressively decreases to favor exploitation of learned policies, the total accumulated reward in a given episode accounts for how good the policy followed was. Thus, the total accumulated reward compared between episodes enables you to evaluate the relative quality of the policies learned. The following graphs show the reward and profit accumulated by the RL agent during training simulations. The x-axis is the episode index and the y-axis is the reward or profit accumulated per episode by the RL agent.

The graph on accumulated reward shows a trend with essentially three phases: relatively large fluctuations during the first 300 episodes (exploration phase), steady growth between 300 and 500 episodes, and a plateau (500–800 episodes) in which the total accumulated reward has reached convergence and revolves around a relatively stable value of $1,800.
Consistently, the accumulated profit is approximately zero during the first 300 episodes and revolves around a relatively stable value of $800 after episode 500.
Together, these results indicate that the RL agent has learned to trade the asset profitably, and has converged toward a stable behavior with reproducible policies to generate profit over the training dataset.
Evaluating generalization on a test set of the RL agent
To evaluate the agent’s ability to generalize to new interactions with the environment, you can constrain the agent to exploit the learned policies by setting the value of ε to zero and simulate new interactions (st, at, rt+1, st+1) with the environment.
The following graph shows the profit generated by the RL agent during test simulations. The x-axis is the episode index and the y-axis is profit accumulated per episode by the RL agent.

The graph shows that after three years of new daily prices for the same asset, the trained agent tends to generate profits with a mean of $2,200 and standard deviation of $600 across 65 test episodes. In contrast to the training simulations, you can observe a steady mean across all 65 test episodes simulated, indicating a strong convergence of the policies learned.
The graph confirms that the RL agent has converged and learned reproducible policies to trade the asset profitably and can generalize to new prices and trends.
Benchmarking a deep recurrent RL agent
Finally, I benchmarked the results from the previous deep RL approach with a deep recurrent RL approach. Instead of defining a state as a window of the past 10 days, an RNN for time series forecasting encodes the state as a 10-day horizon forecast, so the agent benefits from an advisor on future expected prices when learning optimal trading policies for maximizing long-term return.
Implementing a forecast-based RL state space
The following code implements a simple RNN for time series forecasting, following the post Forecasting financial time series with dynamic deep learning on AWS:
And the following function redirects the agent to an observation based on lag or forecasted horizon (the two types of RL approaches discussed in this post), depending on the value of a custom global variable called mode:
Evaluating learning during training of the RNN-based RL agent
As in the previous section for the RL agent in which a state is defined as the lag of the past 10 observations (referred to as lag-based RL), you can evaluate the relative quality of the policies learned by the RNN-based agent by comparing the total accumulated reward between episodes.
The following graphs show the reward and profit accumulated by the RNN-based RL agent during training simulations. The x-axis is the episode index and the y-axis is the reward or profit accumulated per episode by the RL agent.

The graph on accumulated reward shows a steady growth after episode 300, and most episodes after episode 500 result in an accumulated reward of approximately $1,000. But several episodes result in a much higher reward of up to $8,000.
The accumulated dollar profit is approximately zero during the first 300 episodes, revolves around a positive value after episode 500, with again several episodes where the generated profit reaches up to $7,500.
Together, these results indicate that the RL agent has learned to trade the asset profitably, and has converged toward a behavior with policies that generate profit either around $500–800, or some much higher values up to $7,500, over the training dataset.
Evaluating generalization on a test set of the RNN-based RL agent
The following graph shows the profit in dollars generated by the RNN-based RL agent during test simulations. The x-axis is the episode index and the y-axis is the profit accumulated per episode by the RL agent.

The graph shows that the RNN-based RL agent can generalize to new prices and trends. Consistent with the observations made during training, it shows that after three years of new daily prices for the same asset, the trained agent tends to generate profits around $2,000, or some much higher profit up to $7,500. This dual behavior is characterized by an overall mean of $4,900 and standard deviation of $3,000 across the 65 test episodes.
The minimum profit across all 65 test episodes simulated is $1,300, while it was $1,500 for the lag-based RL approach. The maximum profit is $7,500, with frequent accumulated profit over $5,000, while it was only $2,800 for the lag-based RL approach. Thus, the RNN-based RL approach results in higher volatility compared to the lag-based RL approach, but exclusively toward higher profits which is a better outcome given the goal of the RL problem is to generate profit.
The results between the lag-based RL agent and the RNN-based RL agent confirm that the RNN-based RL agent is more beneficial. It has learned reproducible policies to trade the asset that are either similar to, or significantly outperform, the policies learned by the lag-based agent for generating profit.
Conclusion
In this post, I introduced the motivations and underlying theoretical concepts behind deep reinforcement learning, an ML solution framework that enables you to learn and apply optimal policies given a long-term objective. You have seen how to use Amazon SageMaker RL to develop and train a custom deep RL agent to make smart decisions in real time in an arbitrary bidding environment.
You have also seen how to evaluate the performance of the deep RL agent during training and test simulations, and how to benchmark its performance against a more advanced, deep recurrent RL based on RNN for time series forecasting.
This post is for educational purposes only. Past trading performance does not guarantee future performance. The loss in trading can be substantial; investors should use all trading strategies at their own risk.
For more information about popular RL algorithms, sample codes, and documentation, you can visit Reinforcement Learning with Amazon SageMaker RL.
About the Author
Jeremy David Curuksu is a data scientist at AWS and the global lead for financial services at Amazon Machine Learning Solutions Lab. He holds a MSc and a PhD in applied mathematics, and was a research scientist at EPFL (Switzerland) and MIT (US). He is the author of multiple scientific peer-reviewed articles and the book Data Driven, which introduces management consulting in the new age of data science.
Building an NLP-powered search index with Amazon Textract and Amazon Comprehend
Organizations in all industries have a large number of physical documents. It can be difficult to extract text from a scanned document when it contains formats such as tables, forms, paragraphs, and check boxes. Organizations have been addressing these problems with Optical Character Recognition (OCR) technology, but it requires templates for form extraction and custom workflows.
Extracting and analyzing text from images or PDFs is a classic machine learning (ML) and natural language processing (NLP) problem. When extracting the content from a document, you want to maintain the overall context and store the information in a readable and searchable format. Creating a sophisticated algorithm requires a large amount of training data and compute resources. Building and training a perfect machine learning model could be expensive and time-consuming.
This blog post walks you through creating an NLP-powered search index with Amazon Textract and Amazon Comprehend as an automated content-processing pipeline for storing and analyzing scanned image documents. For pdf document processing, please refer AWS Sample github repository to use Textractor.
This solution uses serverless technologies and managed services to be scalable and cost-effective. The services used in this solution include:
- Amazon Textract – Extracts text and data from scanned documents automatically.
- Amazon Comprehend – Uses ML to find insights and relationships in text.
- Amazon ES with Kibana – Searches and visualizes the information.
- Amazon Cognito – Integrates with Amazon ES and authenticates user access to Kibana. For more information, see Get started with Amazon Elasticsearch Service: Use Amazon Cognito for Kibana access control.
- Amazon S3 – Stores your documents and allows for central management with fine-tuned access controls.
- AWS Lambda – Executes code in response to triggers such as changes in data, shifts in system state, or user actions. Because S3 can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.
Architecture
- Users upload OCR image for analysis to Amazon S3.
- Amazon S3 upload triggers AWS Lambda.
- AWS Lambda invokes Amazon Textract to extract text from image.
- AWS Lambda sends the extracted text from image to Amazon Comprehend for entity and key phrase extraction.
- This data is indexed and loaded into Amazon Elasticsearch.
- Kibana gets indexed data.
- Users log into Amazon Cognito.
- Amazon Cognito authenticates to Kibana to search documents.

Deploying the architecture with AWS CloudFormation
The first step is to use an AWS CloudFormation template to provision the necessary IAM role and AWS Lambda function to interact with the Amazon S3, AWS Lambda, Amazon Textract, and Amazon Comprehend APIs.
- Launch the AWS CloudFormation template in the US-East-1 (Northern Virginia) Region:

- You will see the below information on the Create stack screen:
Stack name: document-search
CognitoAdminEmail: abc@amazon.com
DOMAINNAME: documentsearchapp.Edit the CognitoAdminEmail with your email address. You will receive your temporary Kibana credentials in an email.
- Scroll down to Capabilities and check both the boxes to provide acknowledgement that AWS CloudFormation will create IAM resources. For more information, see AWS IAM resources.

- Scroll down to Transforms and choose Create Change Set.
The AWS CloudFormation template uses AWS SAM, which simplifies how to define functions and APIs for serverless applications, as well as features for these services like environment variables. When deploying AWS SAM templates in an AWS CloudFormation template, you need to perform a transform step to convert the AWS SAM template. - Wait a few seconds for the change set to finish computing changes. Your screen should look as follows with Action, Logical Id, Physical Id, Resource Type, and Replacement. Finally, click on the Execute button, which will let AWS CloudFormation launch resources in the background.

- The following screenshot of the Stack Detail page shows the Status of the CloudFormation stack as
CREATE_IN_PROGRESS. Wait up to 20 minutes for the Status to change toCREATE_COMPLETE. In Outputs, copy the value of S3KeyPhraseBucket and KibanaLoginURL.
Uploading documents to the S3 bucket
To upload your documents to your newly created S3 bucket in the above step, complete the following:
- Click on the Amazon S3 Bucket URL you copied from the CloudFormation Output.
- Download the example dataset
demo-data.zipfrom the GitHub repo. This dataset contains a variety of images that include forms, a scanned page with paragraphs, and a two-column document. - Unzip the data.
- Upload the files in the demo-data folder to the Amazon S3 bucket starting with
document-search-blog-s3-<Random string>.
For more information, see How Do I Upload Files and Folders to an S3 Bucket?
After the upload completes, you can see four image files: Employment_application.JPG, expense.png, simple-document-image.jpg, and two-column-image.jpg in the S3 bucket.

Uploading the data to S3 triggers a Lambda S3 event notification to invoke a Lambda function. You can find event triggers configured in your Amazon S3 bucket properties under Advanced settings-> Events. You will see a Lambda function starting with document-search-blog-ComprehendKeyPhraseAnalysis-<Random string>. This Lambda function performs the following:
- Extracts text from images using Amazon Textract.
- Performs key phrase extraction using Amazon Comprehend.
- Searches text using Amazon ES.
The following code example extracts text from images using Amazon Textract:
The following code example extracts key phrases using Amazon Comprehend:
You can index the response received from Amazon Textract and Amazon Comprehend and load it into Amazon ES to create an NLP-powered search index. Refer to the below code:
For more information, see the GitHub repo.
Visualizing and searching documents with Kibana
To visualize and search documents using Kibana, perform the following steps.
- Find the email in your inbox with the subject line “Your temporary password.” Check your junk folder if you don’t see an email in your inbox.

- Go to the Kibana login URL copied from the AWS CloudFormation output.
- Login using your email address as the Username and the temporary password from the confirmation email as the Password. Click on Sign In.

Note: If you didn’t receive the email or missed it while deploying AWS CloudFormation, choose Sign up. - On the next page, enter a new password.
- On the Kibana landing page, from the menu, choose Discover.
- On the Management/Kibana page, you will see Step 1 of 2: Define index pattern, for Index pattern, enter
document*. The message “Success! Your index pattern matches 1 index.” appears. Choose Next step. Click Create Index Pattern.
After a few seconds, you can see the document index page. - Choose Discover from the menu again. In the below screenshot you can see the document attributes.

- To view specific fields for each entry, hover over the field in the left sidebar and click Add.
This moves the fields to the Selected Fields menu. Your Kibana dashboard formulates data in an easy-to-read format. The following screenshot shows the view after adding Entity.DATE, Entity.Location, Entity.PERSON, S3link and text in the Selected Fields menu:

To look at your original document, choose s3link.
Note: You can also add forms and table to view and search tables and forms.
Conclusion
This post demonstrated how to extract and process data from an image document and visualize it to create actionable insights.
Processing scanned image documents helps you uncover large amounts of data, which can provide new business prospects. With managed ML services like Amazon Textract and Amazon Comprehend, you can gain insights into your previously undiscovered data. For example, you could build a custom application to get a text from a scanned legal document, purchase receipts, and purchase orders.
Data at scale is relevant in every industry. Whether you process data from images or PDFs, AWS can simplify your data ingestion and analysis while keeping your overall IT costs manageable.
If this blog post helps you or inspires you to solve a problem, we would love to hear about it! The code for this solution is available on the GitHub repo for you to use and extend. Contributions are always welcome!
About the authors
Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture in hybrid and AWS environments. He enjoys spending time with his family, outdoor activity, and traveling to new destinations to discover new cultures.
Mona is an AI/ML specialist solutions architect working with the AWS Public Sector Team. She works with AWS customers to help them with the adoption of Machine Learning on a large scale. She enjoys painting and cooking in her free time.
[P] Using StyleGAN to make a music visualizer
I’m excited to share this generative video project I worked on with Japanese electronic music artist Qrion for the release of her Sine Wave Party EP.
Here’s the first generated video – two more coming out soon.
This was created using StyleGAN and doing a transfer learning with a custom dataset of images curated by the artist. Qrion picked images that matched the mood of each song (things like clouds, lava hitting the ocean, forest interiors, and snowy mountains) and I generated interpolation videos for each track.
The tempo of the GAN evolution is controlled by a few different things: – the beat of the song – a system I built that incorporates live input (you can tap the keyboard to add a jump to the playback in After Effects) – a keyframeable overall playback speed fader system
I’ve also posted some more images I created with Qrion’s custom models here on my site. There are some further StyleGAN experiments there too if you’re interested.
It’s been fascinating to learn how to use StyleGAN like this. As a visual effects artist, I’m over the moon with the sorts of things that are possible. Indeed, I also wanted to shout out /u/C0D32 who shared an art-centered StyleGAN model that was really influential to me! Thanks for that. Also, of course, /u/gwern who posted an incredible guide to using StyleGAN.
submitted by /u/AtreveteTeTe
[link] [comments]
How AWS is putting machine learning in the hands of every developer and BI analyst
Today AWS announced new ways for you to easily add machine learning (ML) predictions to applications and business intelligence (BI) dashboards using relational data in your Amazon Aurora database and unstructured data in Amazon S3, by simply adding a few statements to your SQL (structured query language) queries and making a few clicks in Amazon QuickSight. Aurora, Amazon Athena, and Amazon QuickSight make direct calls to AWS ML services like Amazon SageMaker and Amazon Comprehend so you don’t need to call them from your application. This makes it more straightforward to add ML predictions to your applications without the need to build custom integrations, move data around, learn separate tools, write complex lines of code, or even have ML experience.
These new changes help make ML more usable and accessible to database developers and business analysts by making sophisticated ML predictions more readily available through SQL queries and dashboards. Formerly, you could spend days writing custom application-level code that must scale and be managed and supported in production. Now anyone who can write SQL can make and use predictions in their applications without any custom “glue code.”
Making sense of a world awash in data
AWS firmly believes that in the not-too-distant future, virtually every application will be infused with ML and artificial intelligence (AI). Tens of thousands of customers benefit from ML through Amazon SageMaker, a fully managed service that allows data scientists and developers the ability to quickly and easily build, train, and deploy ML models at scale.
While there are a variety of ways to build models, and add intelligence to applications through easy-to-use APIs like Amazon Comprehend for example, it can still be challenging to incorporate these models into your databases, analytics, and business intelligence reports. Consider a relatively simple customer service example. Amazon Comprehend can quickly evaluate the sentiment of a piece of text (is it positive or negative?). Suppose that I leave feedback on a store’s customer service page: “Your product stinks and I’ll never buy from you again!” It would be trivial for the store to run sentiment analysis on user feedback and contact me immediately to make things right. The data is available in their database and ML services are widely available.
The problem, however, lies in the difficulty of building prediction pipelines to move data between models and applications.
Developers have historically had to perform a large amount of complicated manual work to take these predictions and make them part of a broader application, process, or analytics dashboard. This can include undifferentiated, tedious application-level code development to copy data between different data stores and locations and transform data between formats, before submitting data to the ML models and transforming the results to use inside your application. Such work tends to be cumbersome and a poor way to use the valuable time of your developers. Moreover, moving data in and out of data stores complicates security and governance.
Putting machine learning in the hands of every developer
At AWS, our mission is clear: we aim to put machine learning in the hands of every developer. We do this by making it easier for you to become productive with sophisticated ML services. Customers of all sizes, including NFL, Intuit, AstraZeneca, and Celgene, rely on AWS ML services such as Amazon SageMaker and Amazon Comprehend. Celgene, for example, uses AWS ML services for toxicology prediction to virtually analyze the biological impacts of potential drugs without putting patients at risk. A model that previously took two months to train can now be trained in four hours.
While AWS offers the broadest and deepest set of AI and ML services, and though we introduced more than 200 machine learning features and capabilities in 2018 alone, we’ve felt that more is needed. Among these other innovations, one of the best things we can do is to enable your existing talent to become productive with ML.
And, specifically, developer talent and business analyst talent.
Though we offer services that improve the productivity of data scientists, we want to give the much broader population of application developers access to fully cloud-native, sophisticated ML services. Tens of thousands of customers use Aurora and are adept at programming with SQL. We believe it is crucial to enable you to run ML learning predictions on this data so you can access innovative data science without slowing down transaction processing. As before, you can train ML models against your business data using Amazon SageMaker, with the added ability to run predictions against those same models with one line of SQL using Aurora or Athena. This makes the results of ML models more accessible for a broad population of application developers.
Lead scoring is a good example of how this works. For example, if you build a CRM system on top of Aurora, you’ll store all of your customer relationship data, marketing outreach, leads, etc. in the database. As leads come from the website they’re moved to Aurora, and your sales team follows up on the leads convert them to customers.
But what if you wanted to help make that process more effective for your sales team? Lead scoring is a predictive model which helps qualify and rank incoming leads so that the sales team can prioritize which leads are most likely to convert to a customer sale, making them more productive. You can take a lead scoring model built by your data science team, or one that you have purchased on the AWS ML Marketplace, deploy it to Amazon SageMaker, and then order all your sales queues by priority based on the prediction from the model. Unlike in the past, you needn’t write any glue code.
Or you might want to use these services together to deliver on a next best offer use case. For example, your customer might phone into your call center complaining about an issue. The customer service representative successfully addresses the issue and proceeds to offer new products or services. How? Well, the representative can pull up a product recommendation on the Amazon QuickSight dashboard that shows multiple views and suggestions.
The first view shows product recommendations based on an Aurora query. The query pulls the customer profile, shopping history, and product catalog, and calls a model in Amazon SageMaker to make product recommendations. The second view is an Athena query that pulls customer browsing history or clickstream data from an S3 data lake, and calls an Amazon SageMaker model to make product recommendations. The third view is an Amazon QuickSight query that takes results from first and second view, calls an ensemble model in Amazon SageMaker, and makes the recommendations. You now have several offers to make based on different views of the customer, and all within one dashboard.
On the BI analyst side, we regularly hear from customers that it’s frustrating to have to build and manage prediction pipelines before getting predictions from a model. Developers currently spend days writing application-level code to move data back and forth between models and applications. You may now choose to deprecate your prediction pipelines and instead use Amazon QuickSight to visualize and report on all your ML predictions.
For application developers and business analysts, these changes make it more straightforward to add ML predictions to your applications without the need to build custom integrations, move data around, learn separate tools, write complex lines of code, or even have ML experience. Instead of days of developer labor, you can now add a few statements to your SQL queries and make a few clicks in Amazon QuickSight.
In these ways, we’re enabling a broader population of developers and data analysts to tap into the power of ML, with no Ph.D. required.
About the Author
Matt Asay (pronounced “Ay-see”) is a principal at AWS, and has spent nearly two decades working for a variety of open source and big data companies. You can follow him on Twitter (@mjasay).
[Discussion] Won’t Max Pooling mess up a lot of information in this Network?
As far as I understood it, Max Pooling takes the maximum positive value and not absolute values.
In this paper, optimized mechanical structures get generated with a U-Net by feeding it mechanical information like nodal displacement, element strains, and volume fractions.
https://arxiv.org/ftp/arxiv/papers/1901/1901.07761.pdf
My question now is: won’t Max Pooling mess up a lot of information, if it just takes the positive value and not the absolute value? Since positive elements in the strain matrix correspond to tensile strains and negative to compressive ones.
submitted by /u/avdalim
[link] [comments]