Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Directed acyclic graph and the definition of causality

I listened to a very interesting talk at MAIS 2019 last Friday about a novel approach to learn DAG using neural networks (all the details in this paper here: arXiv:1906.02226). It’s far from my actual discipline of sensor design and data processing, but I still sent to speak with the author at the poster session afterwards.

We didn’t go into the details of the technique, instead we had a discussion about how there doesn’t seem to exist a usable definition of causality in terms of graph analysis. He said that causality is something we all kind of agree on, but that we can’t define. For example, the direction of the causality arrow between the average temperature and the altitude of a city is clear. If we magically changed the altitude, the average temperature would change, while if we magically changed the average temperature, the altitude wouldn’t change. Therefore, the direction of causality is from “altitude” to “average temperature”.

From my readings in cosmology and thermodynamics, I realized there seems to be a very similar concept that would benefit from being shared here. At least I hope so, it’s sometimes hard to know the exact boundaries 😉

Here is a proposed definition for causality: a causal relationship R from set A to set B is a function transforming A into B such that information that was available in A is lost when working with the set B.

It means that a system that can be uniquely described in A cannot be uniquely described in B and it is impossible to know exactly which element from A was mapped to an element from B. In that sense, the set A has a greater information content than the set B and the function R reduces the amount of information available in the set.

In the case of large scale phenomenon where classical physics tells us that each process is deterministic (ie: maps one unique state to one other unique state), but we must also take into account the passage of time. The chronological order of the events dictates the direction of causality. This is where it gets interesting in my opinion: the arrow of time as defined by physicist Sean Carroll (book, multiple articles) is deeply linked to the evolution of the entropy of the universe. The entropy itself is closely related to information content, from the definition of the Shannon Entropy.

It all comes back to the fact that causality points from a set containing more information to a set containing less information, and not the other way around.

I hope it makes sense and there’s probably a better way to write it all and make the explanation clearer, but I feel like there’s something useful about that.
For example, if we find a causal link between two variables that seems to go against the above definition, it probably means that we are missing some information about the first set, or that the second set is not described in a very “compact” way and has redundant information.

Thanks for your comments!

submitted by /u/i_love_FFT
[link] [comments]