In an era dominated by the widespread integration of Artificial Intelligence, from copilots to chatbots, the cybersecurity industry is witnessing a significant shift in its approach to AI deployment. While large language models (LLMs) often capture headlines, they are frequently ill-suited for the massive scale and real-time demands of cybersecurity applications, where billions of events must be processed daily. The extensive GPU infrastructure, memory requirements, and associated costs of hosting LLMs make them impractical for many companies, especially for deployment scenarios like firewalls or customer endpoint applications.
This challenge has brought about a revitalization of small, efficient AI models. Many cybersecurity tasks, such as malicious binary detection, URL classification, or email classification, do not require generative solutions; instead, they can be effectively solved through classification with small models. These models are cost-effective and capable of running directly on endpoint devices or within cloud infrastructure, offering a practical alternative to their larger counterparts. Even security copilots, typically associated with generative AI, can leverage small models for tasks like alert triage and prioritization.
The critical question with small models, however, is their performance, which is inherently tied to the quality and scale of their training data. This is where LLMs, despite their deployment limitations, play a transformative and strategic role: they can be used intermittently to train small models more effectively. LLMs are highly effective at extracting useful signals from data at scale, modifying existing labels, providing new labels, and creating supplemental data. By merging the advanced learning capabilities of large, expensive models with the high efficiency of small models during the training process, cybersecurity solutions can become fast, commercially viable, and highly effective.
Three key methods underpin this strategic synergy:
- Knowledge Distillation: This method involves a large “teacher” model transferring its learned knowledge to a smaller “student” model. The student model is trained on a combination of categorical labels and the output distribution of the teacher model, allowing it to learn and copy the sophisticated behavior of the larger model without the computational overhead. For example, Sophos X-Ops research on command-line classification models demonstrated that a small student model, trained using the output distribution of a large teacher model on noisy data, significantly outperformed previous production models in detecting Living-off-the-land binaries (LOLBins). This approach proved to be cost-effective and scaled the labeling of large datasets.
- Semi-Supervised Learning: This technique leverages both labeled and vast amounts of unlabeled data, which are common in cybersecurity due to customer telemetry. A large model is initially trained on labeled data and then used to generate labels for previously unlabeled data. This newly labeled data, often reflecting real-world distributions, is then used to train a small, efficient model. Sophos achieved near-LLM performance with a small website productivity classification model by fine-tuning an LLM (T5 Large) to label unlabeled URLs. As more LLM-labeled data was incorporated, the small model’s performance approached that of the best-performing LLM configuration, proving the utility of unlocking previously unusable data.
- Synthetic Data Generation: This method involves LLMs producing new, synthetic examples to train small models more robustly, especially for out-of-distribution scenarios not covered by existing telemetry. LLMs, pre-trained on vast amounts of public knowledge, can generate data similar to what they have been exposed to. Sophos demonstrated this by using GPT-4 to generate entire scam campaigns, including fake Facebook login pages. While a production model initially scored these synthetic pages as benign, training with this new synthetic HTML significantly improved the model’s ability to classify them as malicious and even enhanced its general performance on real-world telemetry.
The convergence of large and small AI models represents a pivotal shift in cybersecurity, offering new avenues to revise outdated models, utilize inaccessible data sources, and innovate cost-effectively. By strategically integrating LLMs into the training process of smaller models, cybersecurity operations can be bolstered, ensuring systems remain resilient and robust against evolving threats. This paradigm not only maximizes existing AI infrastructure but also democratizes advanced cybersecurity capabilities, making them accessible to businesses of all sizes.



Leave a Reply