# Blog

## 5000+ Members

### MEETUPS

LEARN, CONNECT, SHARE

### JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

### CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

# [D] Understanding proof of MaxEnt theorem

 I’m reading Brian Ziebart’s work on maximum causal entropy optimization for inverse reinforcement learning. I’m reading through a few of his thesis chapters to get a deeper understanding, but have gotten stuck on one particular proof: the first line of the proof of Theorem 6.10. The theorem follows easily after the first line, but I can’t make sense of the logic behind the first line. In a nutshell, the theorem shows that under a maximum causal entropy distribution, the likelihood of any policy pi increases in proportion to the expected reward (linear in [state, action] features) under that policy. However to prove this, he starts off by writing the P(pi) = Product over all trajectories (A, S) of P_MaxEnt(A, S)^pi(A, S). I do not understand where this equation comes from. It seems strange to me that it is raising maximum entropy distribution probabilities to the power of the policy probabilities. I would greatly appreciate it if anyone could help me understand this. The theorem is from his thesis (pg 210), available here: http://www.cs.cmu.edu/~bziebart/publications/thesis-bziebart.pdf Full theorem and proof included below: https://i.redd.it/61807bbw17o31.png https://i.redd.it/b6ps5amx17o31.png submitted by /u/celestialquestrial [link] [comments]

Days
:
Hours
:
Minutes
:
Seconds

# Plug yourself into AI and don't miss a beat

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.