Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] The Rise of DataOps (from the ashes of Data Governance) Legacy Data Governance is broken in the ML era

With adding a consistent version system across all of the code the art of coding moved from craft to engineering – the same thing will happen to data governance: https://towardsdatascience.com/the-rise-of-dataops-from-the-ashes-of-data-governance-da3e0c3ac2c4 (full article)

Currently, data governance teams attempt to apply manual control at various points to control the consistency and quality of the data. The introduction of Data Version Control (DVC) version tracking would allow data governance and engineering teams to engineer the data together, filing bugs against data versions, applying quality control checks to the data compilers, etc.

Platforms like Palantir Foundry already treat the management of data in much the same way as versioning of code. Within data versioning platforms datasets can be versioned, branched, acted upon by versioned code to create new data sets. This enables data driven testing, where the data itself is tested in much the same way as that the code that modifies it.

There also some open source options:

  • Data Version Control project is focused on data scientist users.

  • Delta Lake project is a DataBricks’ version control system for data lakes with big data workloads.

submitted by /u/thumbsdrivesmecrazy
[link] [comments]