Scaling Netflix’s threat detection without streaming
The article describes Netflix’s experience building a real time threat detection pipeline in 2018 using a hybrid approach called the Psycho Pattern, combining Spark, Kafka or SQS, and Airflow with micro batch execution. The system worked but faced latency of 5 to 7 minutes, memory spikes and scaling issues. A later migration attempt to Flink streaming slightly reduced latency but did not improve detection quality and added engineering complexity. The main challenge was false positives and poor signal quality, which could have been addressed through better data validation, improved machine learning precision and smarter memory strategies. Key lessons include trusting micro batch when sufficient, treating the watermark table as the heartbeat of the system, prioritizing accuracy over speed and questioning technology changes before adopting them.
https://blog.dataexpert.io/p/scaling-netflixs-threat-detection
Comments
Post a Comment