Engineering Chaos

Contextual Feeling

Mike Saxton — Mon, 30 Mar 2026 03:21:51 GMT

(Before we get going, I’ll open with a Wikipedia style opening and say this is an article is about the Z Axis, or Spherical coordinate system. Not the other Axis (Disambiguation), your history book might discuss)

For years I’ve thought about threat detections on an XY axis. Speed on one side, accuracy on the other. The tradeoff between them is basically a a 1-slope or a 45-degree line — the faster a detection is to implement, the less accurate it tends to be, and vice versa. The goal has always been to find that sweet spot somewhere in the middle: not so fast that you’re drowning in noise, not so slow that you’re never confident in what you’re seeing.

That model held up for a long time. But I think AI has broken it — not by eliminating the tradeoff, but by adding a third dimension that the old model doesn’t account for.

The Original Two: Atomic and Machine Learning

On one end of the spectrum you have atomic detections. Rules based on a specific, known data point — an IP address, a file hash, a registry key, a domain. Sigma rules live here. Emerging threat feeds live here. The value of atomic detections is that they’re deterministic: the same input produces the same output every time. If you write a rule that fires when it sees a specific IOC, and that IOC shows up, the rule fires. No ambiguity.

The tradeoff is tuning. Writing an atomic detection is fast. Getting it accurate — low false positive rate, properly scoped, not firing on legitimate traffic — takes real time. And the threat landscape changes fast enough that rules require constant maintenance to stay relevant. At the bare minimum of any detection program, atomic detections are where you start. They’re the foundation. But they have a ceiling.

On the other end you have machine learning. ML flips the tradeoff. It’s slow to implement — you need historical data, a trained model, ongoing monitoring for accuracy, drift detection over time — but once it’s running it can catch things that no rule would ever catch. Behavioral anomalies, subtle pattern shifts, activity that doesn’t match any known IOC but doesn’t look right either. The catch is that ML models are probabilistic. They’ll tell you something looks suspicious, and they’ll tell you how confident they are, but they won’t tell you definitively that it’s bad. Sometimes they’ll tell you they’re not very confident — and honestly, when they say that, they’re usually right to flag the uncertainty.

So for a long time the detection strategy question was: where on that spectrum do you want to live? Rules or models? Fast and deterministic, or slow and probabilistic? Or my favorite…

AI Adds a Third Axis

Here’s where I think the framing needs to change. When people ask where AI fits on that spectrum, they’re still thinking in two dimensions. And it doesn’t fit cleanly.

AI — specifically LLMs and agents — introduces a third variable: determinism.

Atomic detections are fully deterministic. Machine learning is probabilistic but bounded — the same model on the same data will give you a consistent confidence score. LLMs are non-deterministic by nature. The same prompt, given the same context, can produce different outputs. They can hallucinate. They can go down a reasoning path you didn’t intend. They can be confidently wrong. That’s not a flaw to work around — it’s a fundamental characteristic of the technology that has to be part of how you use it.

If speed and accuracy are the X and Y axes, determinism is the Z axis. And when you look at it that way, atomic detections, ML models, and AI each occupy a different point in three-dimensional space rather than sitting on the same line.

That reframing matters because it changes how you think about where each one belongs in your stack.

The Right Job for Each

There’s an old saying: quick, cheap, or easy — pick two. Detection engineering strategy has always been a version of that problem. The third axis doesn’t solve it. It just makes the tradeoffs more explicit.

Atomic detections are deterministic, fast to deploy, and expensive to maintain at scale; ML models are probabilistic, slow to build, and better at catching what rules miss. AI is non-deterministic, fast at certain tasks, and genuinely bad at others.

The question isn’t which one wins. It’s which one is right for which job.

And I think the industry is currently overindexing on using AI as a detector and underinvesting in using AI as an operator.

Everyone has an agent that “monitors your network” or “finds threats” or “reasons over your alerts.” And some of that is real. But an agent is a feature, not a strategy. Buying a tool that uses AI to detect things doesn’t answer the harder question of what your detection strategy actually is.

Why the Layered Approach Works

In most security conversations cough RSA cough I think there’s a point that often gets missed: AI doesn’t work well in a vacuum. It works well when it has context.

Anthropic published a post on how their own teams use Claude Code internally. They noted that moving to test-driven development — writing tests before writing code — produced more reliable, testable output from Claude. The insight isn’t just that TDD is good engineering hygiene. It’s that giving the AI a defined structure to work within, with clear constraints and expected outcomes, makes the output meaningfully better.

Using AI for detection engineering works the same way. When you ask an AI agent to “find the bad stuff in our network,” you’re giving it almost no structure to work within. When you ask it to write a Sigma rule for a specific TTP, validate it against known-good traffic, and map it to an ATT&CK technique — you’ve given it a test. You’ve given it constraints. You’ve given it a definition of correct. The output is better because the problem is better defined.

The atomic detection layer and the ML layer aren’t just detection mechanisms. They’re also the context that makes AI effective when you bring it in. Your existing Sigma rules tell AI what you already know how to detect. Your ML models tell AI what behavioral baselines look like. Together they give AI the constraints it needs to do useful work — whether that’s filling coverage gaps, tuning thresholds, or reasoning over anomalies that don’t match any existing rule.

Google Research released a paper called MLE-STAR, a state-of-the-art machine learning engineering agent capable of automating various machine learning tasks. The whole premise is that getting better results from ML isn’t about replacing the model — it’s about giving it better inputs, better structure, and better feedback loops. AI can now help do that work. Which means the ML layer you already have can get materially better without starting over.

Where AI Actually Changes Things

The most underrated use of AI in detection right now isn’t finding threats. It’s keeping your detection program running.

Think about what actually breaks down in most detection programs over time. Sigma rules go stale because no one has time to rewrite them when a TTP evolves. ML models drift because the data they were trained on no longer reflects current attacker behavior. The ATT&CK matrix gets updated and the coverage gaps don’t get addressed because mapping detections to TTPs manually is slow and nobody’s priority.

AI is genuinely good at all of those maintenance tasks. Writing a Sigma rule from a threat report is something an LLM can do well and fast. Identifying which existing rules are likely to need updates based on new threat intelligence is something an LLM can reason over. As mentioned, Google has published research on using AI to tune machine learning models — adjusting hyperparameters, identifying drift, improving precision — that points at a future where the model-building loop is itself partially automated.

The non-determinism that makes AI unreliable as a standalone detector becomes much more manageable when a human is reviewing the output before it goes into production. A Sigma rule that an LLM drafts and a human reviews is better than a Sigma rule nobody had time to write. A tuning recommendation from an AI assistant that an engineer validates is better than a model running on stale parameters because nobody got to it.

The way I’m starting to think about this: AI’s role in detection isn’t to replace the atomic layer or the ML layer. It’s to be the engine that keeps both of them current, calibrated, and aligned to the actual threat landscape.

The Framework in Practice

If you think about a mature detection program through this lens, you end up with something like:

Atomic detections for what you know. Fast, deterministic, high-confidence when tuned. AI writes and maintains them — turning threat reports into rules, flagging stale coverage, mapping new TTPs to existing logic.

Machine learning for what you don’t know yet. Probabilistic, behavioral, better at catching novel activity. AI helps tune it — adjusting models, monitoring drift, identifying where the training data needs to be refreshed.

AI agents for what requires reasoning over context. Correlating signals across sources, generating hypotheses, helping analysts work through ambiguous alerts. Used with appropriate skepticism about confidence and with human review in the loop.

None of these replace the others. They cover different parts of the problem, and AI shows up differently in each one.

My Lukewarm AI Take

I’m optimistic about what AI can do for threat detection programs. I’m much more skeptical about how we’re talking about it right now — as if deploying an AI agent is itself a detection strategy.

The teams that will get the most out of AI aren’t the ones that replace their detection program with it. They’re the ones that use it to do the unglamorous work that detection programs actually need: keeping rules current, keeping models calibrated, keeping coverage aligned to what adversaries are actually doing.

That’s not as exciting a demo as an agent that finds threats. But it’s the thing that actually makes detection programs better over time.

Engineering Chaos is about applying modern data engineering to rethink how security teams build and operate their data infrastructure. As always, views are mine. You can share them but I don’t know why you’d want to.

Fodder on the Feuding Format Fracas is Fitting For Finding Final Boss Forensics

Mike Saxton — Wed, 25 Mar 2026 14:28:29 GMT

Part 2 of 2 - You can read about Avro selection here in part 1

Source

Iceberg, Delta Lake, and Hudi are all solid table formats. For most use cases any of them will serve you well — the differences only start to matter at the edges. This post is about our edges, and why we landed where we did.

What We Needed the Table Layer to Do

Before evaluating anything, we wrote down the requirements specific to our platform. Not features — but the actual behaviors we needed in production. A few of them are below:

Continuous streaming ingest. Detection latency starts at the table layer. We couldn’t afford a batch-oriented architecture.
Upserts. Security events change after they land. Threat intel enrichment, identity correlation, analyst annotations — a record that arrives at T+0 might look very different by T+60 seconds. The table format needed to handle that cleanly.
Incremental processing. When a new detection rule is written, we backfill it against historical data. Reprocessing the full dataset every time isn’t viable at our scale. We needed “give me everything that changed since timestamp X” as a native operation.
Timeline and auditability. In security, when a record arrived and what it looked like before enrichment are forensic questions, not optional metadata. We needed that built into the table, not maintained separately.
Avro compatibility. We covered the serialization decision in Part 1. The table format needed to work with it.

Iceberg

Iceberg’s metadata architecture is well-designed. Hidden partitioning, partition evolution without rewriting data, snapshot isolation, and broad engine support across Spark, Flink, Trino, Dremio, and Snowflake. If you need multiple query engines reading the same tables, Iceberg is the most credible answer right now.

Where it didn’t fit us: Iceberg was built for read-heavy analytical workloads on stable data. Streaming upserts at high ingest rates require careful engineering to avoid small file accumulation. Record-level incremental consumption isn’t a native primitive. For a different team with a read-dominated workload, Iceberg would be a strong choice. For our workload, we’d have been building around its edges rather than with them.

Delta Lake

Delta Lake’s foundation is solid — ACID transactions on Parquet, a reliable transaction log, clean MERGE support. If your stack is heavily Databricks or Spark, staying in that ecosystem has real operational advantages.

Two things didn’t fit us. First, our streaming layer isn’t uniformly Spark, and Delta Lake’s design reflects its Spark origins in ways that created friction we didn’t want to carry. Second, Change Data Feed — Delta’s mechanism for incremental consumption — works, but it was added after the fact to an architecture not originally designed around changelogs. For a workload where incremental processing is load-bearing, we wanted something where that model was native.

Why Hudi

Hudi was built around a specific use case: continuous ingest, records that change after landing, consumers that only want to process what changed. That’s a good description of what we were building and a pretty good description of world we live in for Security Engineering. Some of the key features of Hudi that were attractive to us are:

The timeline. Hudi maintains an ordered, atomic log of every action taken against a table — commits, compactions, cleanings, rollbacks — with precise timestamps. For us this serves as forensic infrastructure. When an analyst needs to know what a table looked like at the time of an incident, or when a record was modified and what it contained before, the timeline has it. We’re not maintaining a separate audit system alongside the data.
Upserts. Records are identified by a primary key. When a new version arrives, Hudi handles the merge without a full partition rewrite. An EDR event lands, gets enriched with threat intel 15 seconds later, gets annotated with a case ID 45 seconds after that — three upserts, one record, handled as intended behavior.
Merge on Read and Copy on Write. Hudi lets you choose a storage type per table. MoR writes delta logs for updates and merges at read time — low write latency, right for high-velocity ingest. CoW rewrites files on every upsert — faster reads, right for tables being queried constantly. We run our ingest tables on MoR and our analyst-facing tables on CoW after compaction. Matching the storage model to the access pattern per table is genuinely useful.
Incremental queries. A consumer specifies a point on the timeline and gets back exactly what changed since then. No full scans, no manual offset tracking. New detection rules backfill by walking the timeline in chunks. Downstream processes that fell over during an incident resume from where they stopped.
Avro. Hudi uses Avro as its internal record representation in the versions we’re running. Schema evolution and field resolution are consistent between the serialization layer and the table layer, which is what we were aiming for when we made the Part 1 decision.

Tradeoffs We Knew Going In

Hudi requires more operational configuration than Delta Lake. Compaction scheduling, cleaning policies, timeline retention, index types — the defaults are a starting point, not a final answer. Plan for ongoing tuning.

Iceberg has broader query engine support. If non-Spark engines are a hard requirement, check Hudi’s current compatibility list before committing.

MoR tables accumulate delta logs between compaction runs. If compaction falls behind ingest, read performance degrades. It’s manageable but it needs monitoring.

None of these changed the decision. They were costs we could see and plan for.

Where This Leaves Us

We chose Hudi because our specific requirements — streaming ingest, upserts, incremental processing, and a native timeline — mapped to what it was built for. Iceberg and Delta Lake are both good. They were just better fits for different workloads than ours.

So, that’s the whole analysis. The right format for your stack depends on your requirements. These were ours.

A Super Cereal Post About Serialization

Mike Saxton — Tue, 17 Mar 2026 00:32:13 GMT

Part 1 of 2

Stream vs batch. Kappa vs Lambda. Copy on Write vs Merge on Read. These are all things security engineers think about (ok, maybe just me and a handful of others?) but they are functions of a high-performance data system. I have a feeling these words will become more common parlance in the security engineering space as security and data engineering teams begin to merge.

In building real-time systems, we needed to decide how they would be built. For our users, they need to see high-performance results — sub-minute detections on petabyte-scale data. What we didn’t want them to see, well worry about really, was any of the words I opened with.

So this will be a 2-part post on our thinking and selection process. This post will cover serialization formats and the following will cover table formats. There’s no “right” answer here — for 90% of use cases most of the capabilities will work. The remaining 10% is on how your platform performs, and that’s what we’ll dive into.

This is the story of why we chose Apache Avro as our serialization format for cybersecurity data and why we felt this was important for what we do.

The Inherent Problem with Cybersecurity Data

Cybersecurity data is not like analytics data. It is not like transactional data. It is not like IoT sensor data. It is its own uniquely chaotic beast, and while the data itself isn’t that unique, when combining it into a system-of-systems, that’s where things begin to get weird.

But here’s what makes it different:

Your schemas are someone else’s decision. And they aren’t all that stable. Elastic is on Version 9 of their product and Version 8 of the Elastic Common Schema (ECS) as an example (correlation, causation, all of that). This isn’t a dig on Elastic — in fact they led the way here in my opinion — but I’ll selfishly use the example to say cybersecurity schema management is just a mess. When an EDR vendor pushes an agent update and a field you’ve been relying on for 18 months is renamed, split into two fields, or now carries a nested object where a string used to live, it breaks things fast.

Your data sources multiply constantly. At any given time you’re ingesting from endpoint agents, network taps, cloud provider audit logs, identity providers, SaaS security APIs, threat intelligence feeds, and whatever new tool the IT team just bought. Each one has its own schema and none of them agreed on a standard before publishing.

Volume spikes are tied to incidents. When something is actively on fire, your ingest rate can jump 10x in seconds. That is exactly the wrong moment to be debugging a schema mismatch.

Late-arriving data is the norm, not the exception. Security logs get buffered, re-routed, and delayed by the very controls they’re monitoring. You cannot assume your stream is ordered or complete.

We like to say cybersecurity data is always in motion. It’s not about compacting it (that’s important), it’s not about decreasing storage volume (that’s important too), but specifically for streaming analytics it’s about taming the chaos into something close to manageable.

Why JSON and CSV Don’t Scale Here

JSON is everywhere in security. Almost every API returns it. Almost every vendor exports it. And it is a genuinely terrible choice for a high-throughput data pipeline at scale.

To start, its verbose. Every record carries its own field names. When you’re processing millions of events per second, you are paying a real compute and storage cost to repeat the string "sourceIpAddress" on every single row.
It has no schema contract. JSON is whatever the producer decided to send. There is no enforcement, no evolution tracking, no compatibility guarantee. One bad producer can silently corrupt a downstream table with no warning.
It’s slow to parse at volume. Text-based parsing is expensive. When you’re doing stream analytics on high-velocity threat detection pipelines, that cost compounds fast.

CSV is even worse — no types, no nesting, and a comma in the wrong string field becomes your entire afternoon.

As we started looking towards streaming data we needed something binary, something schematized, and something built to handle change over time. That points squarely at the three main contenders: Protocol Buffers, Thrift, and Avro.

Avro vs Protobuf vs Thrift: Why Avro Wins for This Domain

Protobuf and Thrift are excellent serialization formats. They’re fast, compact, and widely adopted. But they carry a critical assumption: the schema lives in compiled code.

In a product company with stable, versioned APIs, that’s fine. In a security data platform ingesting from dozens of external sources that you don’t control, it’s a liability.

When a vendor changes their schema, you cannot wait for a code review, a build pipeline, and a deployment to resume ingestion. You need to adapt in the data layer, not the application layer.

Avro makes a fundamentally different architectural choice: the schema travels with the data, registered in a schema registry, and resolved at read time. This single design decision unlocks everything that makes Avro the right choice for cybersecurity pipelines.

Schema Evolution That Actually Works

Avro has first-class support for schema evolution with well-defined compatibility rules. You can:

∙ Add a field with a default value — old readers ignore it, new readers see it. Fully backward compatible.

∙ Remove a field that has a default — new readers use the default when reading old data. Fully forward compatible.

∙ Rename a field using aliases — old data maps to the new name transparently.

This is a disciplined contract. And it means that when your EDR vendor renames process_name to process.executable.name in their next agent release, your pipeline doesn’t break — you update the schema, register the new version, define the alias, and data keeps flowing.

In cybersecurity, schema evolution isn’t a nice-to-have. It’s an operational requirement.

The Schema Registry as a Source of Truth

When you pair Avro with a schema registry (Confluent Schema Registry, AWS Glue Schema Registry, or similar), something really cool happens: your schema becomes a governed, versioned artifact.

Every producer registers its schema before writing. Every consumer fetches the schema by ID embedded in the message. Compatibility rules are enforced at write time — not discovered at query time.

For a security data platform, this means:

∙ Lineage becomes traceable. You know exactly what schema version produced any given record.

∙ Breaking changes are caught at ingestion, not when an analyst runs a query at midnight during an incident.

∙ New data sources are onboarded with a schema contract, not just a hope that the fields stay consistent.

∙ Security analysts get consistency across every vendor. When CrowdStrike, SentinelOne, and Microsoft Defender all land in the same table with normalized, schema-enforced field names, analysts stop memorizing vendor-specific quirks and start writing detection logic that actually works across your entire endpoint fleet.

This is data engineering discipline applied to one of the most schema-volatile domains that exists. And it changes the operational posture of the whole platform.

What This Means for Stream Analytics

The payoff is in your detection pipelines. When every event flowing through your Kafka topics is Avro-encoded with a registered schema, your stream processors gain superpowers:

Enrichment is predictable. When you join an authentication event against a user identity store, you know exactly what fields are available and what types they carry. No defensive null checks for fields that might or might not exist depending on the source.
Multi-source correlation works cleanly. When you’re correlating a process execution event, a network connection event, and a file write event to detect lateral movement, all three sources are speaking the same schematized language. You can join them confidently.
Schema drift is observable. When a new schema version is registered, you can alert on it, review it, and decide whether it’s backward compatible before it affects downstream consumers. Schema changes become change management events, not surprises.
Historical backfills don’t break. When you need to reprocess 90 days of events against a new detection rule, you’re reading Avro records where the schema is embedded in the data. Old records still deserialize correctly even after the schema has evolved.

The Honest Tradeoffs

Avro is not perfect and we have found a few things out along the way.

Schema registry dependency is real. If your registry is unavailable, producers and consumers that require online schema resolution will fail. You need to treat your schema registry with the same operational care as your message broker. Cache schemas aggressively. Build for registry outages.

Binary formats are harder to debug. When a JSON record is malformed you can read it. When an Avro record is malformed you need tooling to inspect it. Invest early in good observability and deserialization debugging utilities.

Schema governance requires discipline. The power of a schema registry is only realized if your teams actually use it. Producers that bypass the registry and write raw JSON or unregistered Avro undermine the whole model. We tried this early on and saw a massive decrease in performance.

Where This Leaves Us

We chose Avro because cybersecurity data is defined by change — vendors change schemas, new sources get added, the threat landscape forces new data types into existence. A serialization format that cannot evolve gracefully is a serialization format that becomes a liability.

Schema evolution gives us the operational resilience to keep pipelines running when the data changes. The schema registry gives us the governance foundation that makes the whole platform trustworthy. And the analyst on the other end gets consistent, predictable data regardless of which vendor generated it.

That foundation is what makes the table format argument in our next post even possible to have. When the serialization layer is solid, you can focus on optimizing how data is stored and queried — rather than constantly firefighting schema breakage.

Up Next — Part 2: The Table Format Wars

We evaluated Iceberg, Delta Lake, and Hudi. We ultimately chose Hudi because for our stack it worked. As I mentioned earlier, for 90% of use cases the differences won’t matter much — but for the 10% left, that’s where our decision was made.