This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
…
continue reading
Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
…
continue reading
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading hig ...
…
continue reading
Databases and data engineering episodes of Software Engineering Daily
…
continue reading
Discussions around Data Engineering
…
continue reading
Unlocking the Power of Data: A Guide for Leaders and Executives" As a leader or executive, you know the importance of data in driving business decisions and staying ahead of the competition. But, with the increasing amount of data generated daily, it can be overwhelming to know where to start and how to utilize this valuable asset effectively. This blog, with multiple topics, addresses the technical terminology in data engineering and analytics on the cloud.
…
continue reading
Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision boar…
…
continue reading
What does it take to go from fixing a broken link to becoming a committer for one of the world’s leading open-source projects? Amogh Desai, Senior Software Engineer at Astronomer, takes us through his journey with Apache Airflow. From small contributions to building meaningful connections in the open-source community, Amogh’s story provides actiona…
…
continue reading
Summary The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his…
…
continue reading
Data orchestration and machine learning are shaping how organizations handle massive datasets and drive customer-focused strategies. Tools like Apache Airflow are central to this transformation. In this episode, Vasyl Vasyuta, R&D Team Leader at Optimove, joins us to discuss how his team leverages Airflow to optimize data processing, orchestrate ma…
…
continue reading
Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explores the "write everything twice" problem, where similar pipelines are built without code reuse, and discusses the challenges of managing different SQL dialects and relational databases. Max also touc…
…
continue reading
Bridging the gap between data teams and business priorities is essential for maximizing impact and building value-driven workflows. Katie Bauer, Senior Director of Data at GlossGenius, joins us to share her principles for creating effective, aligned data teams. In this episode, Katie draws from her experience at GlossGenius, Reddit and Twitter to h…
…
continue reading
Scaling deployments for a billion users demands innovation, precision and resilience. In this episode, we dive into how LinkedIn optimizes its continuous deployment process using Apache Airflow. Rahul Gade, Staff Software Engineer at LinkedIn, shares his insights on building scalable systems and democratizing deployments for over 10,000 engineers. …
…
continue reading
Summary In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his journey into the world of data and discusses the complexities of database selection, highlighting the trade-offs between different database architectures and how these choices affect system design, q…
…
continue reading
Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to enabl…
…
continue reading
Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter,…
…
continue reading
In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads th…
…
continue reading
1
How Uber Manages 1 Million Daily Tasks Using Airflow, with Shobhit Shah and Sumit Maheshwari
28:44
When data orchestration reaches Uber’s scale, innovation becomes a necessity, not a luxury. In this episode, we discuss the innovations behind Uber’s unique Airflow setup. With our guests Shobhit Shah and Sumit Maheshwari, both Staff Software Engineers at Uber, we explore how their team manages one of the largest data workflow systems in the world.…
…
continue reading
Summary The challenges of integrating all of the tools in the modern data stack has led to a new generation of tools that focus on a fully integrated workflow. At the same time, there have been many approaches to how much of the workflow is driven by code vs. not. Burak Karakan is of the opinion that a fully integrated workflow that is driven entir…
…
continue reading
Efficient data orchestration is the backbone of modern analytics and AI-driven workflows. Without the right tools, even the best data can fall short of its potential. In this episode, Andrea Bombino, Co-Founder and Head of Analytics Engineering at Astrafy, shares insights into his team’s approach to optimizing data transformation and orchestration …
…
continue reading
Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous computation of data, machine learning, and AI workloads. The discussion covers the concept of incremental computation, the origins of Feldera, and its unique ability to handle both streaming and batch …
…
continue reading
1
Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan
28:02
In this special roundup episode of The Data Engineering Show, the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia…
…
continue reading
Data orchestration is evolving faster than ever and Apache Airflow 3 is set to revolutionize how enterprises handle complex workflows. In this episode, we dive into the exciting advancements with Vikram Koka, Chief Strategy Officer at Astronomer and PMC Member at The Apache Software Foundation. Vikram shares his insights on the evolution of Airflow…
…
continue reading
Summary Gleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the challenges and innovations in data migrations. Gleb shares his experiences building and scaling data platforms at companies like Autodesk and Lyft, and how these experiences inspired the creation of DataFold to address data quality issues across teams. He out…
…
continue reading
Data and AI are revolutionizing HR, empowering leaders to measure performance and drive strategic decisions like never before. In this episode, we explore the transformation of HR technology with Guy Dassa, Chief Technology Officer at 15Five, as he shares insights into their evolving data platform. Guy discusses how 15Five equips HR leaders with to…
…
continue reading
Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner workings of the Lance format for table definitions and file storage, …
…
continue reading
Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the P…
…
continue reading
Summary In this episode of the Data Engineering Podcast Lukas Schulte, co-founder and CEO of SDF, explores the development and capabilities of this fast and expressive SQL transformation tool. From its origins as a solution for addressing data privacy, governance, and quality concerns in modern data management, to its unique features like static an…
…
continue reading
Unlocking engineering productivity goes beyond coding — it’s about managing knowledge efficiently. In this episode, we explore the innovative ways in which Dosu leverages Airflow for data orchestration and supports the Airflow project. Devin Stein, Founder of Dosu, shares his insights on how engineering teams can focus on value-added work by automa…
…
continue reading
In this episode of The Data Engineering Show, the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows. Together they cover: Automated Data Pipelines: Ryan…
…
continue reading
Summary Airbyte is one of the most prominent platforms for data movement. Over the past 4 years they have invested heavily in solutions for scaling the self-hosted and cloud operations, as well as the quality and stability of their connectors. As a result of that hard work, they have declared their commitment to the future of the platform with a 1.…
…
continue reading
Harnessing data at scale is the key to driving innovation in autonomous vehicle technology. In this episode, we uncover how advanced orchestration tools are transforming machine learning operations in the automotive industry. Serjesh Sharma, Supervisor ADAS Machine Learning Operations (MLOps) at Ford Motor Company, joins us to discuss the challenge…
…
continue reading
Data failures are inevitable but how you manage them can define the success of your operations. In this episode, we dive deep into the challenges of data engineering and AI with Brendan Frick, Senior Engineering Manager, Data at GumGum. Brendan shares his unique approach to managing task failures and DAG issues in a high-stakes ad-tech environment.…
…
continue reading
Summary As data architectures become more elaborate and the number of applications of data increases, it becomes increasingly challenging to locate and access the underlying data. Gravitino was created to provide a single interface to locate and query your data. In this episode Junping Du explains how Gravitino works, the capabilities that it unloc…
…
continue reading
1
From Sensors to Datasets: Enhancing Airflow at Astronomer with Maggie Stark and Marion Azoulai
22:25
A 13% reduction in failure rates — this is how two data scientists at Astronomer revolutionized their data pipelines using Apache Airflow.In this episode, we enter the world of data orchestration and AI with Maggie Stark and Marion Azoulai, both Senior Data Scientists at Astronomer. Maggie and Marion discuss how their team re-architected their use …
…
continue reading
Mastering the flow of data is essential for driving innovation and efficiency in today’s competitive landscape. In this episode, we explore the evolution of data orchestration and the pivotal role of Apache Airflow in modern data workflows.Ben Tallman, Chief Technology Officer at M Science, joins us and shares his extensive experience with Airflow,…
…
continue reading
Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward.Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workf…
…
continue reading
Data orchestration is revolutionizing the way companies manage and process data. In this episode, we explore the critical role of data orchestration in modern data workflows and how Apache Airflow is used to enhance data processing and AI model deployment.Hannan Kravitz, Data Engineering Team Leader at Artlist, joins us to share his insights on lev…
…
continue reading
Data engineering is constantly evolving and staying ahead means mastering tools like Apache Airflow. In this episode, we explore the world of data engineering with Alexandre Magno Lima Martins, Senior Data Engineer at Teya. Alexandre talks about optimizing data workflows and the smart solutions they've created at Teya to make data processing easier…
…
continue reading
Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures, the need for rapid changes, and high customer demands. Chris delves …
…
continue reading