🐝 +🔍 = ❤️ Bytewax and Lenses.io Integration Announcement

By Zander Matheson

At Bytewax, we believe that stream processing should be both powerful and intuitive. As organizations scale, handling the stream processing efficiently becomes increasingly critical. Often, this involves working with Apache Kafka, a leading event-streaming platform. Managing Kafka streams can be complex, requiring multiple tools. But what if you could explore, monitor, and process your data seamlessly—all within a single ecosystem? With the upcoming Bytewax and Lenses.io integration, you can. 🎉 Lenses is a game-changer for working with Kafka. It simplifies querying, monitoring, and managing Kafka streams, and all of that for free with the community edition.


Why We Love Lenses

Lenses.io has redefined the way we interact with Kafka. With its sleek, intuitive interface and robust set of features, Lenses makes exploring streaming data an absolute breeze. Here’s what we love about Lenses:

  • Interactive Data Exploration: Lenses provides an immersive, real-time view into your Kafka streams. Whether you're tracking event flows or validating message schemas, Lenses offers the visibility and control you need.
  • Powerful Monitoring & Alerts: The ability to monitor streaming pipelines and set up alerts means issues can be caught and resolved quickly. This ensures your data is flowing smoothly, and you always stay ahead of potential hiccups.
  • Enhanced Productivity: By giving you instant access to data insights via SQL studio, Lenses reduces the friction between discovery and action. It’s the perfect launchpad for rapid experimentation and development.

What Does It Mean For Our Users?

Our integration with Lenses.io means that Bytewax users can now take the next step—from exploring data to processing it—with unprecedented ease. Imagine being able to switch gears seamlessly between investigating your Kafka topics and deploying Python-based dataflows to transform that data into actionable insights.

lenses1.png

Here’s what excites us the most:

  • Unified Experience: No more juggling between different tools or environments. With Lenses and Bytewax working together, you have one cohesive ecosystem for managing your streaming data.
  • Accelerated Development: Rapid prototyping becomes even more efficient when you can instantly move from data discovery to processing. Identify trends, validate hypotheses, and deploy changes faster than ever.
  • Enhanced Collaboration: Data teams can now share insights and iterate on processing pipelines with ease. Lenses gives a clear picture of what's happening in your Kafka cluster, while Bytewax handles the heavy lifting of stream processing in a Python-friendly environment.

lenses2.png


A Use Case: From Data Discovery to Actionable Insights

Let’s walk you through a scenario where the integration truly shines—a day in the life of a data scientist or ML engineer using Bytewax and Lenses together.

Step 1: Discovering Data with Lenses

Picture this: you're monitoring your Kafka topic that collects user interaction events from your web application. With Lenses, you dive into the topic, inspect messages, and verify that your event schema is intact. The real-time dashboards give you a pulse on the data, highlighting any anomalies or interesting patterns.

Step 2: Building a Bytewax Dataflow

Armed with insights from Lenses, you decide it's time to process this raw data. You craft a Bytewax dataflow in Python to filter out noise, enrich the events with additional context, and aggregate metrics like session duration and click rates.

Here’s a snippet of how that dataflow might look:

import os

from bytewax import operators as op
from bytewax.connectors.kafka import operators as kop
from bytewax.dataflow import Dataflow

BROKERS = os.environ.get("KAFKA_BORKERS", "localhost:19092").split(";")
IN_TOPICS = os.environ.get("KAFKA_IN_TOPICS", "in-topic").split(";")
OUT_TOPIC = os.environ.get("KAFKA_OUT_TOPIC", "out_topic")
REGISTRY_URL = os.environ["REDPANDA_REGISTRY_URL"]

flow = Dataflow("kafka_in_out")
kinp = kop.input("inp", flow, brokers=BROKERS, topics=IN_TOPICS)
op.inspect("inspect-errors", kinp.errs)

# schema registry configuration
client = SchemaRegistryClient({"url": REGISTRY_URL})

# Use plain avro instead of confluent's wire format.
# We need to specify the schema in the deserializer too here.
key_schema = client.get_latest_version("sensor-key").schema
key_de = PlainAvroDeserializer(schema=key_schema)

val_schema = client.get_latest_version("sensor-value").schema
val_de = PlainAvroDeserializer(schema=val_schema)

# Deserialize both key and value
msgs = kop.deserialize("de", kinp.oks, key_deserializer=key_de, val_deserializer=val_de)

def extract_identifier(msg: KafkaSourceMessage) -> str:
    return msg.key["identifier"]


keyed = op.key_on("key_on_identifier", msgs.oks, extract_identifier)

# Let's window the input
def accumulate(acc: List[str], msg: KafkaSourceMessage) -> List[str]:
    acc.append(msg.value["value"])
    return acc


cc = SystemClock()
wc = TumblingWindower(timedelta(seconds=1), datetime(2023, 1, 1, tzinfo=timezone.utc))
windows = win.fold_window("calc_avg", keyed, cc, wc, list, accumulate, list.__add__)


# And do some calculations on each window
def calc_avg(key__wm__batch) -> KafkaSinkMessage[Dict, Dict]:
    key, (wm, batch) = key__wm__batch
    # Use the correct schemas here, or the serialization
    # step will fail later
    return KafkaSinkMessage(
        key={"identifier": key, "name": "topic_key"},
        value={
            "identifier": key,
            "avg": sum(batch) / len(batch),
            "window_start": wm.open_time.isoformat(),
            "window_end": wm.close_time.isoformat(),
        },
    )


avgs = op.map("avg", windows.down, calc_avg)

op.inspect("inspect-out-data", avgs)

kop.output("out1", kinp.oks, brokers=BROKERS, topic=OUT_TOPIC)

Step 2.5: Deploying & Monitoring Your Dataflow in Lenses

Once your Bytewax dataflow is ready, deploying it is a breeze. Our integration ensures that you can launch your processing pipelines with minimal friction, and Lenses' intuitive UI becomes your command center. Here’s what makes deployment and monitoring in Lenses so exciting:

  • Effortless Deployment: Seamlessly integrate your Bytewax jobs into your Kafka ecosystem. With just a few clicks, your Python dataflow is up and running, processing streaming data in real time.
  • Centralized Monitoring: Lenses’ dynamic dashboards now extend to your deployed Bytewax dataflows. Monitor job health, throughput, and performance metrics all in one place.
  • Instant Feedback Loop: Spot issues quickly with real-time alerts and logs. The visibility provided by Lenses means you can iterate on your dataflow logic on the fly, ensuring optimal performance.

Step 3: Observing & Iterating

After deployment, you switch back to Lenses to monitor the output Kafka topic. Seeing the enriched and aggregated data rolling in gives you confidence that the pipeline is working as intended. And if something needs tweaking? It’s all part of the rapid feedback loop—observe, iterate, and improve.


Looking Forward

At Bytewax, we’re passionate about empowering our users to harness the full potential of streaming data. Our integration with Lenses.io is not just about connecting two powerful tools—it’s about creating an ecosystem where data flows seamlessly from exploration to action.

We’re excited to see how our community will innovate with this new integration, building smarter, faster, and more resilient streaming applications. Here's to a future where every byte of data drives meaningful impact!

Stay tuned, and happy streaming!

Stay updated with our newsletter

Subscribe and never miss another blog post, announcement, or community event.

Zander Matheson

Zander Matheson

CEO, Founder
Zander is a seasoned data engineer who has founded and currently helms Bytewax. Zander has worked in the data space since 2014 at Heroku, GitHub, and an NLP startup. Before that, he attended business school at the UT Austin and HEC Paris in Europe.
Next post