# Blog

## 5000+ Members

### MEETUPS

LEARN, CONNECT, SHARE

### JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

### CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

# [D] Understanding proof of MaxEnt theorem

 I’m reading Brian Ziebart’s work on maximum causal entropy optimization for inverse reinforcement learning. I’m reading through a few of his thesis chapters to get a deeper understanding, but have gotten stuck on one particular proof: the first line of the proof of Theorem 6.10. The theorem follows easily after the first line, but I can’t make sense of the logic behind the first line. In a nutshell, the theorem shows that under a maximum causal entropy distribution, the likelihood of any policy pi increases in proportion to the expected reward (linear in [state, action] features) under that policy. However to prove this, he starts off by writing the P(pi) = Product over all trajectories (A, S) of P_MaxEnt(A, S)^pi(A, S). I do not understand where this equation comes from. It seems strange to me that it is raising maximum entropy distribution probabilities to the power of the policy probabilities. I would greatly appreciate it if anyone could help me understand this. The theorem is from his thesis (pg 210), available here: http://www.cs.cmu.edu/~bziebart/publications/thesis-bziebart.pdf Full theorem and proof included below: https://i.redd.it/61807bbw17o31.png https://i.redd.it/b6ps5amx17o31.png submitted by /u/celestialquestrial [link] [comments]