What if your graph pipelines could scale effortlessly without sacrificing flexibility or speed? Join us for #BeamSummit 2026 to see how! #GraphNeural etworks promise powerful insights, but working with large-scale graph data is anything but simple. In this talk, Yogesh Tewari introduces #GraphFlow, a modular Python toolkit built on Apache Beam, designed to turn complex GNN workflows into scalable, production-ready pipelines. Don’t miss this session to learn how to build efficient, end-to-end GNN pipelines that scale with your data, and your ambition. You are still on time to register: https://beamsummit.org/
Apache Beam
IT Services and IT Consulting
Apache Beam is an open source community driving batch & stream data processing.
About us
INTRODUCING APACHE BEAM The Unified Apache Beam Model The easiest way to do batch and streaming data processing. Write once, run anywhere data processing for mission-critical production workloads.
- Website
-
https://beam.apache.org/
External link for Apache Beam
- Industry
- IT Services and IT Consulting
- Company size
- 1,001-5,000 employees
- Type
- Public Company
Products
Apache Beam
Big Data Processing & Distribution Software
Apache Beam is an open-source, unified programming model for batch and streaming data processing pipelines that simplifies large-scale data processing dynamics. Thousands of organizations around the world choose Apache Beam due to its unique data processing features, proven scale, and powerful yet extensible capabilities.
Updates
-
At scale, data quality isn’t just a technical problem, it’s an operational one. At Intuit Credit Karma, managing hundreds of tables, tens of thousands of columns, and data impacting 140M+ users made manual monitoring impossible. In this session, Puneet Singh and Veenit Shah share how they moved from reactive firefighting to proactive observability, using Monte Carlo to automate data quality across five key pillars: timeliness, completeness, accuracy, observability, and governance. From building a centralized Data Asset Registry to reducing alert fatigue with smarter monitoring, this is a real-world look at what it takes to make data quality actually work at scale. If you're still relying on manual checks, this one might change your approach, register here for free: https://lnkd.in/etK8FhCK
-
-
Rethinking your data architecture? Start here. Join this session at #BeamSummit 2026 to discover how open table formats are reshaping the modern lakehouse. Apache Iceberg is changing the way data teams design, scale, and manage their lakehouses, bringing flexibility, reliability, and true openness to the core of data infrastructure. In this talk, Brad Miro will explain what makes Iceberg a game-changer and why open table formats are becoming the foundation of next-gen data platforms. You’ll also learn how Apache Beam fits into the picture, enabling seamless data processing within an Iceberg-powered lakehouse, from ingestion to transformation at scale. Don’t miss this session to understand how to build a future-proof, open lakehouse architecture that actually scales with your data. You are still on time to register: https://beamsummit.org/
-
-
Streaming CDC sounds great, until redeployments break everything. In large-scale, federated environments, even small pipeline changes can risk data loss or duplication. And today, existing Dataflow restart mechanisms don’t fully solve it. In this session, Jiufeng Liu will share a real-world approach to running Spanner → BigQuery CDC pipelines at scale using a YAML-driven, self-serve model plus an interim solution that keeps redeployments safe in production. No hype, just the problem, the workaround, and the tradeoffs that still matter. If you're dealing with CDC in production, this one hits close to home, register here: https://lnkd.in/etK8FhCK
-
-
See you next week in New York City, where data, speed, and intelligence converge! 🗽 Join this keynote by Joe Raso to find out what does it take to monitor a self-driving fleet in real time, across cities, systems, and constant motion! He will take us behind the scenes of Waymo’s data infrastructure, where stateful time-series processing with Dataflow provides a live, end-to-end view of vehicles navigating the world. From fleet monitoring to importance sampling, you’ll explore how streaming #pipelines enable near real-time insights at scale. Plus, get a glimpse into the future as Waymo integrates Google’s TPU fleet and #LLMs to introduce streaming inference into its observability stack. Register here: https://lnkd.in/etK8FhCK
-
-
Time series data sounds simple, until you try to process it at scale. In Apache Beam, handling chronologically ordered data within a distributed, unordered system often means building complex custom logic. In this session, Shunping Huang and Claude van der Merwe will tackle a key challenge: buffering data by timestamp to enable accurate time series processing. From comparing different approaches to showcasing a real anomaly detection use case, this talk breaks down what it takes to get time right in Beam. If you're working with streaming or time-based data, this is a must-attend, register here: https://lnkd.in/etK8FhCK
-
-
What if your data pipelines could define their own validation rules? In this session, Jay J. and Pablo Costamagna introduce an agent-driven approach to data quality, where AI determines what to validate, and Apache Beam handles how to do it at scale. Using RAG and MCP, an AI agent reads live data catalogs and governance rules, translates them into Beam-ready logic, and automatically triggers pipelines to run checks in real time. No more hardcoded rules. Just adaptive, scalable data validation powered by AI + Beam. If you care about data quality in dynamic environments, this is a glimpse into what’s next. Register here: https://lnkd.in/etK8FhCK
-
-
#BeamSummit meets the World Cup ⚽🌎 On day one (June 22), the Norway vs Senegal match takes place in the evening, there’s no conflict with the event schedule. However, the city will already be busier throughout the day as fans move around ahead of the game, so we recommend allowing extra time to get around. The upside? You’ll get to experience the #WorldCup atmosphere up close while being part of Beam Summit! Check out more details: https://lnkd.in/gy9jVRf5
-
-
What if building data pipelines wasn’t about writing code, but orchestrating skills? Join us at #BeamSummit 2026 in New York City for less boilerplate and more architecture. https://lnkd.in/etK8FhCK Apache Beam is powerful, but complex. In this session, Canburak Tümer and Israel Herraiz explore a new paradigm: Agent Skills, modular AI capabilities that act as specialized data engineers. From encoding Beam concepts like windowing and state into reusable skills, to using agents for performance optimization and automated testing, this talk shows how multi-agent workflows can turn natural language into production-ready pipelines. If you’re thinking about the future of data engineering in the age of #AI, this is one to watch.
-
-
Building streaming pipelines is one thing, running them at scale is another. In this session, Tom Stepp and Ryan W. dive into the latest innovations in Dataflow Streaming, from smarter autoscaling to improved reliability and high availability. You’ll also get a look at advancements in streaming ML and IO performance, helping you build pipelines that are not just scalable, but truly production-ready. If you're designing next-gen streaming architectures, this is one to watch. Join us at #BeamSummit 2026, register here for free: https://lnkd.in/etK8FhCK
-