Data Reliability for Data Lakes | Databricks

September 10, 2020

ABOUT THE KEYNOTE (https://www.datacouncil.ai/talks/data-reliability-for-data-lakes)

Building a modern data lake requires dealing with a lot of complexity: querying historical data + streaming data simultaneously (lambda architecture), validation to ensure data isn’t too messy for data science and machine learning, reprocessing to handle failures, and ensuring ACID-compliant data updates. We created the Delta Lake project, open sourced under the Linux Foundation, to relieve data scientists and data engineers from these complex systems problems and instead enable them to focus on extracting value from data. In this talk, we’ll dive into these challenges and how ACID transactions solve them. We’ll discuss patterns that emerge when you can focus on data quality and the nitty gritty internals of ACID on Spark which enable this focus.

ABOUT THE KEYNOTE SPEAKER

Michael Armbrust is committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and Databricks Delta. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.

source

linux foundation

BENISNOUS

Data Reliability for Data Lakes | Databricks

Leave a Reply Cancel reply

You May Also Like

ReSharper/Rider 2021.1 Release Party 🥳

Tamil Nadu Blockchain Policy 2020 – CryptoTamil – Golden Opportunity!! Learn Blockchain Programming

New Operating System #ubuntu install ! full Video ke liye comment kar dena video aa jaygi donsto

Leave a Reply Cancel reply