Site Reliability Engineering δημόσια
[search 0]
Download the App!
show episodes
 
Artwork
 
SRE Prodcast brings Google's experience with Site Reliability Engineering together with special guests and exciting topics to discuss the present and future of reliable production engineering!
  continue reading
 
Welcome to Crashcasts, the podcast for tech enthusiasts! Whether you're a seasoned engineer or just starting out, this podcast will teach something to you about Site Reliability Engineering . Join host Sheila and Victor as they dive deep into essential topics. Each episode is presented with gradually increasing in complexity to cover everything from basic concepts to advanced edge cases. Whether you're preparing for a phone screen or brushing up on your skills, this podcast offers invaluable ...
  continue reading
 
From Google to Amazon, 5G to blockchain, selfies to self-driving, the internet is expected to become an even bigger part of our lives - and it's all supported by a growing legion of mammoth facilities known as data centers. Our digital footprint grows phenomenally every year and that data passes through, is stored and processed by the contents of these largest of energy intensive buildings on the planet. **** This podcast focuses on the facilities, technologies, vendors, owners, developers, ...
  continue reading
 
Artwork
 
The O'Reilly Media Podcast spreads the knowledge of innovators. At O’Reilly, a big part of our business is paying attention to what’s new and interesting in the world of technology. The O'Reilly Media Podcast features interviews with the people working on the forefront of technology.
  continue reading
 
Artwork
 
Welcome to Building Reddit. In this podcast, host Ryan H. Lewis will take you behind the scenes into how Reddit is built. From some of the coolest projects like Reddit Recap and Collectible Avatars, to the daily work lives of Reddit's employees. You’ll hear from software engineers, product managers, data scientists, community managers, marketers, and more!
  continue reading
 
Building better software, one incident at a time. Host Kevin Riggle talks with software engineers about that time they broke production. Whether you're an industry professional, or just curious about what makes the modern Internet run and what happens when it breaks, we bring you stories you haven't heard elsewhere. This is the audio version of the podcast. Watch on YouTube: https://youtube.com/@critical-point Produced by Complex Systems Group (https://complexsystems.group). Part of Critical ...
  continue reading
 
Artwork

1
Developer Tharun

Tharun Shiv

Unsubscribe
Unsubscribe
Μηνιαία
 
A one-stop podcast destination to know about Programming and how to excel in it! I will be sharing about Programming, Web development, freelancing and mainly my experience on it. Make sure to Subscribe to the podcast on Spotify/Google Podcasts or on any platform you're listening to. Lead by Tharun Shiv. Visit me at https://www.tharunshiv.com
  continue reading
 
Artwork

1
The DevOps Dojo

Johan Abildskov

Unsubscribe
Unsubscribe
Μηνιαία
 
The DevOps Dojo is an educational podcast focused on DevOps and making the world of building software a little better. Each episode covers a principle, practice or common DevOps fable. Join the Dojo to expand your software development horizons!
  continue reading
 
Artwork

1
Exascale Computing Project Podcast

Exascale Computing Project

Unsubscribe
Unsubscribe
Μηνιαία
 
The Exascale Computing Project (ECP) is accelerating delivery of a capable exascale computing ecosystem to provide breakthrough solutions that will address America's most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. Let’s Talk Exascale explores Application Development, Software Technology, and Hardware and Integration—focus areas of the ECP.
  continue reading
 
Artwork

1
The Soda Podcast

Soda Data | soda.io

Unsubscribe
Unsubscribe
Μηνιαία
 
The Soda Podcast brings forward different voices and perspectives to help find a common ground to solve the problems that many are facing when it comes to good data. Explore our two series - 'In Conversation With' and 'Data Dream Team' - to listen to an outstanding collection of discussions, insights, and good chat focused on data, and the tools, technologies, methodologies, and people.
  continue reading
 
Artwork

1
The Pipeline: All Things CD & DevOps Podcast by The CD Foundation

Jacqueline Salinas, Director of Ecosystem & Community Development

Unsubscribe
Unsubscribe
Μηνιαία
 
The Pipeline: All Things CD & DevOps is created and hosted by the CD Foundation's Director of Ecosystem & Community Development - Jacqueline Salinas. This is a series of interviews with industry experts, leaders, and innovators. The Pipeline will cover a range of topics that are centered around CD & DevOps. The CDF’s goal is to educate, entertain, provide tips and insights to make the community better software engineers. The intent is to supply up-to-date industry news and innovations, as we ...
  continue reading
 
Loading …
show series
 
Join us on Site Reliability Engineering Crashcasts as we delve into the critical art of decision-making under uncertainty with expert Victor. In this episode, we explore: The unique challenges of decision-making in SRE roles How the OODA loop framework can enhance quick and effective decisions The "fail fast, fail safe" approach to managing limited…
  continue reading
 
In this episode, Adam reads book three in The Zen of Programming (1988) by Geoffrey James. This book is unlike any programming book you've encountered. So, let's try something new for the podcast to showcase this poignant, accurate, and funny book. This episode features analects from the fabled zen Master Rinzai. Want more? 🚀 New listener? Start wi…
  continue reading
 
Sarah Butt (Principal Engineer, Centralized Incident Response, Salesforce) and Vrai Stacey (Staff Software Engineer, Google) join hosts Steve McGhee and Jordan Greenberg to dive into incident response—particularly tooling and software for reliability incidents. Tune in for an in-depth discussion on topics such as the importance of communication and…
  continue reading
 
Silvia Botros (SRE Architect, Twilio | Author of "High Performance MySQL, 4th edition”) and Niall Murphy (Co-founder & CEO, Stanza) join hosts Steve McGhee and Jordan Greenberg, to discuss cultural shifts in database engineering, rate limiting, load shedding, holistic approaches to reliability, proactive measures to build customer trust, and much m…
  continue reading
 
In this episode, Adam reads book two in The Zen of Programming (1988) by Geoffrey James. This book is unlike any programming book you've encountered. So, let's try something new for the podcast to showcase this poignant, accurate, and funny book. This episode features folktales from the fabled zen Master Noa-Op. Want more? 🚀 New listener? Start wit…
  continue reading
 
Ready to supercharge your Site Reliability Engineering skills? In this episode, Sheila and Victor delve into the best strategies and resources for continuous learning in SRE. In this episode, we explore: The importance of continuous learning in SRE — Discover why staying updated is crucial in this rapidly evolving field. Effective learning strategi…
  continue reading
 
Curious about how containerization has revolutionized application deployment and management? Welcome to Site Reliability Engineering Crashcasts! In this episode, we explore: The basics of containerization and how it differs from traditional virtualization. The crucial role Docker played in popularizing container technology. Kubernetes' functionalit…
  continue reading
 
Ever wondered how leading tech companies achieve near-perfect uptime? Tune in to this episode of Site Reliability Engineering Crashcasts as Sheila and Victor break down the marvels of designing highly available systems. In this episode, we explore: The critical importance of highly available systems and their impact on businesses. Fundamental strat…
  continue reading
 
Dive into the essentials of monitoring and logging in this episode of Site Reliability Engineering Crashcasts with Sheila and Victor! In this episode, we explore: The difference between monitoring and logging, explained through a clever medical analogy. A detailed comparison of Prometheus, Grafana, and the ELK stack, including their strengths and w…
  continue reading
 
Ready to unravel the mysteries of performance troubleshooting and latency diagnosis in SRE? Join host Sheila and expert Victor as they dive deep into essential techniques and best practices. In this episode, we explore: Profiling, Tracing, Logging, and Monitoring: Discover how these key tools can help you understand and improve system performance. …
  continue reading
 
Unlock the potential of automation in Site Reliability Engineering in this episode of Site Reliability Engineering Crashcasts! In this episode, we explore: What automation means for SRE and how it can transform your workflows. Common tasks that can be automated, freeing up engineers to focus on strategic initiatives. The concept of self-healing sys…
  continue reading
 
Dive deep into the world of DevOps and Site Reliability Engineering (SRE) with us in this enlightening episode of Site Reliability Engineering Crashcasts! In this episode, we explore: Definitions and foundational principles of DevOps and SRE. The historical origins of both practices, including a surprising fact about Google’s pioneering role in SRE…
  continue reading
 
Join us on Site Reliability Engineering Crashcasts as we delve into the nuanced world of reliability metrics that go beyond the typical uptime percentages. Hosted by Sheila and featuring SRE expert Victor, this episode is packed with insights you won't want to miss. In this episode, we explore: Understanding reliability beyond the "five nines" (99.…
  continue reading
 
Get ready for an action-packed episode of Site Reliability Engineering Crashcasts! Join Sheila and SRE expert Victor as they unravel the thrilling world of war stories and effective strategies for troubleshooting complex production issues. In this episode, we explore: The concept of "war stories" in SRE and their significance Common complex product…
  continue reading
 
Unlock the full potential of cloud management with Terraform in our latest episode of Site Reliability Engineering Crashcasts. Join Sheila and Victor as they delve into how Terraform can transform your infrastructure management practices. In this episode, we explore: An introduction to Terraform and Infrastructure as Code (IaC) The key differences …
  continue reading
 
We're diving deep into how Puppet can revolutionize your SRE practices. In this episode, we explore: Discover how Puppet streamlines infrastructure management and enforces desired states automatically. Learn the impact of Puppet in continuous delivery through automating deployments and ensuring consistency. Explore the strengths and limitations of …
  continue reading
 
Get ready to untangle the complexities of configuration management with Chef in this engaging episode of Site Reliability Engineering Crashcasts! In this episode, we explore: Configuration Management 101: Understand why maintaining a consistent and reliable IT infrastructure is crucial for SREs. Chef's Role and Components: Discover how Chef uses In…
  continue reading
 
Discover how Ansible revolutionizes infrastructure management and powers automation in SRE practices in this exciting episode. In this episode, we explore: Learn what makes Ansible an essential tool for infrastructure as code. Explore the features that make Ansible a favorite in SRE, from idempotency to modularity. Hear a real-world success story o…
  continue reading
 
Liz Fong-Jones (former Google SRE and current Field CTO at honeycomb.io) joins hosts Steve McGhee and Jordan Greenberg for a lively discussion centered around observability, its evolution from monitoring, and its role in modern software development. Tune in for more on the importance of observability as a spectrum, the evolving role of SREs, and ad…
  continue reading
 
Ben Treynor Sloss (VP of Engineering, Google) joins hosts Steve McGhee and Dr. Jennifer Petoff (Director of Technical Infrastructure Education, Google) to share the evolution of SRE and its impact on software development, how AI and ML significantly impacts SRE practices, and the future of SRE. Ben coined the term "Site Reliability Engineering" for…
  continue reading
 
In this episode, Healfdene Goguen (Principal Engineer, Google) joins hosts Steve McGhee and Jordan Greenberg to discuss the vast amount of work to be done by SREs, and the fascinating challenges to tackle with clear real-world implications. It's a truly exciting time to be an SRE at Google!Από τον Google Prodcast Team
  continue reading
 
In this episode, Adam reads book two in The Zen of Programming (1988) by Geoffrey James. This book is unlike any programming book you've encountered. So, let's try something new for the podcast to showcase this poignant, accurate, and funny book. This episode features chronicles from the fabled zen Master Ninjei. Want more? 🚀 New listener? Start wi…
  continue reading
 
In this season of Google Prodcast, current and former SREs, both within and outside of Google, chat with hosts Steve McGhee and Jordan Greenberg to discuss software systems designed and built by SREs. For "episode zero", guests Amy Tobey (Live Services SRE, Netflix) and Dr. Vladyslav Ukis (Head of R&D, Siemens Healthineers, Author of "Establishing …
  continue reading
 
Dive into the world of Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with our expert guest, Victor, as we unravel these crucial concepts in Software Reliability Engineering. In this episode, we explore: The definitions and importance of SLIs and SLOs in measuring service reliability Real-world examples of common SLIs and strat…
  continue reading
 
In this episode, Adam reads the preface, forward, and introduction to The Zen of Programming (1988) by Geoffrey James. This book is unlike any programming book you've encountered. So, let's try something new for the podcast to showcase this poignant, accurate, and funny book. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE …
  continue reading
 
In this episode, Adam welcomes Dan Slimmon, an experienced Site Reliability Engineer (SRE) to discuss aspects of incident response and troubleshooting in software engineering. Dan explains his methodology for clinical troubleshooting, the importance of maintaining a common mental model, and techniques for leading effective incident response efforts…
  continue reading
 
In this episode of Small Batches, host Adam Hawkins welcomes Alex Nesbitt, a strategy expert and member of the Flow Collective, to delve into the nuances of strategic thinking. The discussion covers different types of strategies, pro-tips on strategic thinking, and how strategy relates to the concept of flight levels. Nesbitt shares insights from h…
  continue reading
 
Host Kevin Riggle interviews Melanie Ensign, former Global Head of Security, Privacy, & Engineering Communications for Uber, about what the role of a CISO (Chief Information Security Officer) is. (Hint: It's not being the smartest security person in the room.) Bonus episode from https://warstories.criticalpoint.tv/episodes/the-reporter-called-her-c…
  continue reading
 
Tesla xAI data center in Memphis Reviewing Google's year-over-year sustainability efforts & report iMasons Climate Accord committee requests all vendors provide Scope 3 verified data AI benefits for sustainability, climate, energy and more. Support the showΑπό τον GDCG
  continue reading
 
Adam discusses strategy in preparation for the next episode. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE giveaway for a copy of "Playing to Win" 🧭 Get the Small Batches Way guide to software delivery excellence 🥋 Software Kaizen: My One-on-One System for Engineering Leadership 📘 Playing to Win by Lafely & Martin Chapter…
  continue reading
 
Host Kevin Riggle interviews Andrey Petrov about three different incidents he was involved with: A programming mistake in high school that filled its alumni's email inboxes, a Twitter analytics site he built that got co-opted as part of a phishing scam, and how he won a bug bounty on the Ethereum blockchain rollup prototype developed by superstar h…
  continue reading
 
Adam discusses three (new-ish) ideas from time on a new gemba. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE giveaway for a copy of "Modern Software Engineering" 🧭 Get the Small Batches Way guide to software delivery excellence 🥋 Software Kaizen: My One-on-One System for Engineering Leadership 📘 Flight Levels by Klaus Leo…
  continue reading
 
Kevin Riggle interviews Melanie Ensign (Discernible Inc.), former Global Head of Security, Privacy, & Engineering Communications for Uber, about building good bug bounty programs, incident management processes, and one especially memorable Christmas morning. Melanie's company: https://discernibleinc.com/ LinkedIn: https://www.linkedin.com/in/melani…
  continue reading
 
Data center with liquid cooling... in a house Revisiting fuel cell viability as big, green hydrogen centers come online Why liquid cooling is still ramping up across the industry Achieving success in the data center industry Support the show
  continue reading
 
Adam describes using Hexagonal Architecture, also known as Ports and Adapters, for software delivery excellence. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE giveaway for a copy of "Modern Software Engineering" 🧭 Get the Small Batches Way guide to software delivery excellence 🥋 Software Kaizen: My One-on-One System for E…
  continue reading
 
Kevin Riggle interviews Deb Chachra, professor of engineering at Olin College and author of HOW INFRASTRUCTURE WORKS about climate change, the housing crisis, the green/sustainable energy transition, and how we can build a better world for ourselves and everyone around us. BUY DEB'S BOOK! https://criticalpoint.tv/infrastructure Deb's Twitter: https…
  continue reading
 
Reddit is a big place and the safety of our users is one of our highest priorities. Scaling that safety is a constant focus, and we’ve built and evolved many different tools to enable that, used by Reddit employees and by community moderators. In this episode, you’ll hear from Phil Aquilina, a Staff Engineer on the Community Safety team. His team r…
  continue reading
 
Adam welcomes Steve Pereira and Andrew Davis to discuss their new book, Flow Engineering. They discuss the book's origin story and the use of cybernetics to drive effective action. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE giveaway for a copy of "Modern Software Engineering" 🧭 Get the Small Batches Way guide to softwa…
  continue reading
 
A book recommendation: Data Center Handbook [Geng, et al.] Interviews & Resume preparation AI & power demand: paths ahead and lessons from NoVA AI in the data center Support the showΑπό τον GDCM
  continue reading
 
In perhaps a very fitting way to wrap up the Career Pathways Series, we talk to Ian Douglas about Developer Advocacy, a role that really demonstrates how software development skills can be applied in a variety of meaningful ways. Ian dives into how different this role can look from company to company, the travel, the conferences, the content, the g…
  continue reading
 
Bailey and Jeannie sit down with Foster Taylor to talk all about his experience as a QA Engineer. We discuss the role QA plays in the Software Development Lifecycle and the nuanced techhical and non-technical skills he acquired and fine-tuned. Come for the QA, stay for laughs! If you or someone you know are code curious, we encourage you to attend …
  continue reading
 
Adam presents the mental model behind T1 and T2 signals, a necessary lexicon for understanding production operations. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE giveaway for a copy of "Flow Engineering" 🧭 Get your FREE Guide to software delivery excellence ☕️ Small Batches #103 - Understanding Production Operations (Th…
  continue reading
 
In this interview, Kate Tester really breaks down the role of Data Engineering into its component parts from pipeline building to stakeholder tooling. She dives deep into important concepts, tooling, and her day-to-day work. We loved talking with Kate and learning about her journey from Turing Backend Engineering student and SQL enthusiast to downs…
  continue reading
 
In this episode with Leiya Kenney we have an amazing and honest conversation on her time as a Support Engineer. We talk about what her experience was and the ways support engineering can differ from company to company. And, let’s be honest, there can be a huge stigma tied to this role. Leiya breaks that down too. Tune in and learn all about this am…
  continue reading
 
Adam answers a listener's request of advice on succeeding in high-level company or project environments with seven tips. Want more? 🚀 New listener? Start with the introduction. 🎁 Enter the FREE giveaway for a copy of "Flow Engineering" 🧭 Get your FREE Guide to software delivery excellence 📘 Buy "Goldratt's Rules of Flow" 📘 Buy "Playing to Win" (my …
  continue reading
 
We loved chatting with Cristina Peña about her role as a UX Engineer. In this episode Cristina walks us through setting up component libraries, the specifics of what she learned at Turing that helped her, and what it’s like setting up a new UX Engineering team with a tenured teammate. If you are interested in design and front end engineering this i…
  continue reading
 
Bailey Diveley and Jeannie Evans sit down with Rob Stringer to talk about Site Reliability Engineering and his unique path to get here. If you are someone who likes to learn how things work under the hood, you’re going to want to have a listen. If you or someone you know are code curious, we encourage you to attend a Turing Try Coding Event. You ca…
  continue reading
 
Adam presents TDD as skill zero, the one that unlocks all the others. Want more? 🚀 New listener? Start with the introduction. 🎁 April 2024 Giveaway instructions 🧭 Get your FREE Guide to software delivery excellence ☕️ Small Batches #65: Systems Thinking ☕️ Dave Farley on Small Batches 📘 Tidy First? by Kent Beck Chapters (00:00) - Skill Zero: Test D…
  continue reading
 
Loading …

Οδηγός γρήγορης αναφοράς