• Home
  • Resources
    • Find Resources by Topic Tags
    • Cybersecurity Policy Chart
    • CSIAC Reports
    • Webinars
    • Podcasts
    • Cybersecurity Digest
    • Standards & Reference Docs
    • Journals
    • Certifications
    • Acronym DB
    • Cybersecurity Related Websites
  • Services
    • Free Technical Inquiry
    • Core Analysis Task (CAT) Program
    • Subject Matter Expert (SME) Network
    • Training
    • Contact Us
  • Community
    • Upcoming Events
    • Cybersecurity
    • Modeling & Simulation
    • Knowledge Management
    • Software Engineering
  • About
    • About the CSIAC
    • The CSIAC Team
    • Subject Matter Expert (SME) Support
    • DTIC’s IAC Program
    • DTIC’s R&E Gateway
    • DTIC STI Program
    • FAQs
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Login / Register

CSIAC

Cyber Security and Information Systems Information Analysis Center

  • Resources
    • Find Resources by Topic Tags
    • Cybersecurity Policy Chart
    • CSIAC Reports
    • Webinars
    • Podcasts
    • Cybersecurity Digest
    • Standards & Reference Docs
    • Journals
    • Certifications
    • Acronym DB
    • Cybersecurity Websites
  • Services
    • Free Technical Inquiry
    • Core Analysis Task (CAT) Program
    • Subject Matter Expert (SME) Network
    • Training
    • Contact
  • Community
    • Upcoming Events
    • Cybersecurity
    • Modeling & Simulation
    • Knowledge Management
    • Software Engineering
  • About
    • About the CSIAC
    • The CSIAC Team
    • Subject Matter Expert (SME) Support
    • DTIC’s IAC Program
    • DTIC’s R&E Gateway
    • DTIC STI Program
    • FAQs
  • Cybersecurity
  • Modeling & Simulation
  • Knowledge Management
  • Software Engineering
/ Journal Issues / Cyber Science & Technology at the Army Research Laboratory (ARL) / Machine Learning and Network Intrusion Detection: Results from Grammatical Inference

Machine Learning and Network Intrusion Detection: Results from Grammatical Inference

Published in Journal of Cyber Security and Information Systems
Volume: 5 Number: 1 - Cyber Science & Technology at the Army Research Laboratory (ARL)

Author: Dr. Richard Harang
Posted: 01/23/2017 | Leave a Comment

Conclusion

The high performance of machine learning in other domains has stimulated significant interest in applying it to network security, however (as noted in [1]), despite the breakneck pace of major successes with machine learning in many other domains, and the large amount of effort spent to produce machine learning-based intrusion detection systems, in practice most major network defense providers focus continue to use signature-based methods which have been in active use since the late 1990’s.

Drawing on the extensive literature on grammatical analysis, we propose that this is a reflection of a fundamental difference between more conventional domains of machine learning and network security. In particular, because network security – particularly network security applications that focus on analysis of packet contents – operates on the domain of formal grammars that are rigorously interpreted (as compared to the domain of natural language translation, where human intuition can often “fill in the gaps” in translation), it is an intrinsically difficult problem that a) is demonstrably intractable in the most general case, and b) cannot be addressed with the relatively crude features that appear to be most common in the literature. While some modest success has been recently realized in applying sequence-to-sequence models (thus at least partially avoiding the question of feature spaces) for grammatical inference in specific instances of specific protocols [32], there remains no method to demonstrate that such methods will generalize even to different instances of the same protocol, let alone novel protocols in the same class.

In fact, results from grammatical inference show that there is quite likely no general method that can be applied to arbitrary data to separate benign and malicious traffic; any practical method should therefore be restricted to a particular domain, analyze that domain carefully, and at least attempt to investigate what properties of the protocol under analysis may allow it to be effectively learned. The empirical effectiveness of Snort and Bro signatures suggest that the domain of malicious traffic is likely more tractable, and may be easier to learn. The appearance of particular byte sequences in malicious but not benign traffic can be viewed (informally) as evidence that the class of malicious languages is of finite elasticity (due to the absence of a limit language) within the class of all protocols that can produce accepting inputs to the system under consideration, thus supporting identifiability. Feature representation is also important. N-gram based features in particular will quite often be insufficiently powerful to model complex grammars or protocols; in some cases, sufficiently large values of n may be able to overcome this limitation for specific subclasses of protocols, however this is likely to be highly problem specific, and requires careful evaluation for any given proposed system.

While significant open questions remain – such as methods for performing inference on the restricted classes of grammars that in practical terms make up many existing protocols – the immediate results of applying grammatical inference theory to machine learning for intrusion detection both help explain the lack of widespread adoption of such systems, and suggest appropriate avenues for future work.

Pages: Page 1 Page 2 Page 3

Previous Article:
« The Cyber Security Collaborative Research Alliance: Unifying...
Next Article:
Synergistic Architecture for Human-Machine Intrusion Detection »

References

  1. R. Sommer and V. Paxson, "Outside the closed world: On using machine learning for network intrusion detection," in IEEE symposium on security and privacy, 2010.
  2. V. Paxson, "Bro: a system for detecting network intruders in real-time," Computer networks, vol. 31, no. 23, pp. 2435-2463, 1999.
  3. Roesch, Martin, "Snort: Lightweight Intrusion Detection for Networks," in LISA , Seattle, Washington, 1999.
  4. S. Axelsson, "The base-rate fallacy and the difficulty of intrusion detection," ACM Transactions on Information and System Security (TISSEC), vol. 3, no. 3, pp. 186-205, 2000.
  5. A. Kott, "Towards fundamental science of cyber security," in Network Science and Cybersecurity, New York, Springer, 2014, pp. 1-13.
  6. M. Sipser, Introduction to the Theory of Computation., Boston, MA: Thomson Course Technology, 2006., 2006.
  7. C. De la Higuera, Grammatical inference: learning automata and grammars, Cambridge University Press, 2010.
  8. M. E. Gold, "Language identification in the limit," Information and control, vol. 10, no. 5, pp. 447-474, 1967.
  9. L. G. Valiant, "A theory of the learnable," Commun. ACM, vol. 27, no. 11, pp. 1134--1142, 1984.
  10. M. E. Gold, "Complexity of automaton identification from given data," Information and control, vol. 37, no. 3, pp. 302-320, 1978.
  11. M. Kearns and L. Valiant, "Cryptographic limitations on learning Boolean formulae and finite automata," Journal of the ACM, vol. 41, no. 1, pp. 67-95, 1994.
  12. D. Angluin, "Learning regular sets from queries and counterexamples," Information and computation, vol. 75, no. 2, pp. 87-106, 1987.
  13. V. H. Rajesh Parekh, "Learning DFA from simple examples," Machine Learning, vol. 44, pp. 9-35, 2001.
  14. D. Angluin, "Finding patterns common to a set of strings," in Proceedings of the eleventh annual ACM symposium on Theory of computing, 1979.
  15. D. Angluin, "Inductive inference of formal languages from positive data," Information and control , vol. 45, no. 2, pp. 117-135, 1980.
  16. K. N. Wood and R. E. Harang, "Grammatical Inference and Language Frameworks for LANGSEC," in 2015 IEEE Security and Privacy Workshops, 2015.
  17. T. Motoki, T. Shinohara and K. Wright, "Correct definition of finite elasticity," Research Institute of Fundamental Information Science, Kyushu University Japan, 1990.
  18. C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin and W.-Y. Lin, "Intrusion detection by machine learning: A review," Expert Systems with Applications, vol. 36, no. 10, pp. 11994--12000, 2009.
  19. P. Laskov, P. Düssel, C. Schäfer and K. Rieck, "Learning intrusion detection: supervised or unsupervised?," in International Conference on Image Analysis and Processing, Heidelberg, 2005.
  20. S. Axelsson, "Intrusion detection systems: A survey and taxonomy," 2000.
  21. K. Wang, J. J. Parekh and S. J. Stolfo, "Anagram: A content anomaly detector resistant to mimicry attack," in Recent Advances in Intrusion Detection, Heidelberg, 2006.
  22. R. Perdisci, D. Ariu, P. Fogla, G. Giacinto and W. Lee, "McPAD: A multiple classifier system for accurate payload-based anomaly detection," Computer Networks , vol. 53, no. 6, pp. 864--881, 2009.
  23. W. Hu, W. Hu and S. Maybank, "Adaboost-based algorithm for network intrusion detection," IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , vol. 38, no. 2, pp. 577--583, 2008.
  24. P. Sangkatsanee, N. Wattanapongsakorn and C. Charnsripinyo, "Practical real-time intrusion detection using machine learning approaches," Computer Communications , vol. 38, no. 18, pp. 2227--2235, 2011.
  25. O. Linda, T. Vollmer and M. Manic, "Neural network based intrusion detection system for critical infrastructures," in International Joint Conference on Neural Networks, 2009.
  26. N. Abe, B. Zadrozny and J. Langford, "Outlier detection by active learning," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006.
  27. H. Xiao, J. Sun, Y. Liu, S.-W. Lin and C. Sun, "Tzuyu: Learning stateful typestates," in 28th International Conference on Automated Software Engineering , 2013.
  28. M. Sutton, A. Greene and P. Amini, Fuzzing: brute force vulnerability discovery, Pearson Education, 2007.
  29. M. Zalewski, "american fuzzy lop," [Online]. Available: http://lcamtuf.coredump.cx/afl/.
  30. D. Aitel, "The advantages of block-based protocol analysis for security testing," Immunity Inc, 2002.
  31. S. Stolfo, "The Third International Knowledge Discovery and Data Mining," University of California Irvine, 2002. [Online]. Available: .
  32. C. Sheridan and R. Harang, "Grammatical Inference and Machine Learning Approaches to Post-Hoc LangSec," in IEEE Security and Privacy Workshop on Language-Theoretic Security, 2016.

Author

Dr. Richard Harang
Dr. Richard Harang
Dr. Richard Harang received his PhD in Statistics and Applied Probability from the University of California Santa Barbara in 2010. After a year of postdoctoral research in the Computational Science and Engineering group under Dr. Linda Petzold, he began work at the U.S. Army Research Laboratory in 2011 focusing on applications of statistics and statistical machine learning to problems in network security. His current research interests include machine learning on structured data, analysis and attribution of source code and binary samples, and using generative models of time series data to explore properties of the underlying process.

Reader Interactions

Leave a Comment Cancel

You must be logged in to post a comment.

sidebar

Blog Sidebar

Featured Content

Data Privacy Day - Jan 28

Data Privacy Day is January 28th

You can help create a global community that respects privacy, safeguards data, and enables trust. You can help teach others about privacy at home, at work, and in your community.

Learn How

Featured Subject Matter Expert (SME): Daksha Bhasker

A dynamic CSIAC SME, Senior Principal Cybersecurity Architect, Daksha Bhasker has 20 years of experience in the telecommunications services provider industry. She has worked in systems security design and architecture in production environments of carriers, often leading multidisciplinary teams for cybersecurity integration, from conception to delivery of complex technical solutions. As a CSIAC SME, Daksha's contributions include several published CSIAC Journal articles and a webinar presentation on the sophiscated architectures that phone carriers use to stop robocalls.

View SME's Contributed Content

The DoD Cybersecurity Policy Chart

The DoD Cybersecurity Policy Chart

This chart captures the tremendous breadth of applicable policies, some of which many cybersecurity professionals may not even be aware, in a helpful organizational scheme.

View the Policy Chart

CSIAC Report - Smart Cities, Smart Bases and Secure Cloud Architecture for Resiliency by Design

Integration of Smart City Technologies to create Smart Bases for DoD will require due diligence with respect to the security of the data produced by Internet of Things (IOT) and Industrial Internet of Things (IIOT). This will increase more so with the rollout of 5G and increased automation "at the edge". Commercially, data will be moving to the cloud first, and then stored for process improvement analysis by end-users. As such, implementation of Secure Cloud Architectures is a must. This report provides some use cases and a description of a risk based approach to cloud data security. Clear understanding, adaptation, and implementation of a secure cloud framework will provide the military the means to make progress in becoming a smart military.

Read the Report

CSIAC Journal - Data-Centric Environment: Rise of Internet-Based Modern Warfare “iWar”

CSIAC Journal Cover Volume 7 Number 4

This journal addresses a collection of modern security concerns that range from social media attacks and internet-connected devices to a hypothetical defense strategy for private sector entities.

Read the Journal

CSIAC Journal M&S Special Edition - M&S Applied Across Broad Spectrum Defense and Federal Endeavors

CSIAC Journal Cover Volume 7 Number 3

This Special Edition of the CSIAC Journal highlights a broad array of modeling and simulation contributions – whether in training, testing, experimentation, research, engineering, or other endeavors.

Read the Journal

CSIAC Journal - Resilient Industrial Control Systems (ICS) & Cyber Physical Systems (CPS)

CSIAC Journal Cover Volume 7 Number 2

This edition of the CSIAC Journal focuses on the topic of cybersecurity of Cyber-Physical Systems (CPS), particularly those that make up Critical Infrastructure (CI).

Read the Journal

Recent Video Podcasts

  • Privacy Impact Assessment: The Foundation for Managing Privacy Risk Series: The CSIAC Podcast
  • Agile Condor: Supercomputing at the Edge for Intelligent Analytics Series: CSIAC Webinars
  • Securing the Supply Chain: A Hybrid Approach to Effective SCRM Policies and Procedures Series: The CSIAC Podcast
  • DoD Vulnerability Disclosure Program (VDP) Series: CSIAC Webinars
  • 5 Best Practices for a Secure Infrastructure Series: The CSIAC Podcast
View all Podcasts

Upcoming Events

Wed 20

SANS Stay Sharp: Blue Team Operations 2021

January 18 - January 20
Organizer: SANS Institute
Wed 20

SANS Cyber Security Central: Jan 2021

January 18 - January 23
Organizer: SANS Institute
Wed 20

AI Champions, Online – Supply Chain

January 19 @ 14:00 - January 21 @ 15:30 EST
Thu 21

SANS Cyber Threat Intelligence Summit 2021

January 21 - January 22
Organizer: SANS Institute
Fri 22

SANS Cyber Threat Intelligence Solutions Track 2021

January 22 @ 09:00 - 17:00 EST
Organizer: SANS Institute
View all Events

Footer

CSIAC Products & Services

  • Free Technical Inquiry
  • Core Analysis Tasks (CATs)
  • Resources
  • Events Calendar
  • Frequently Asked Questions
  • Product Feedback Form

About CSIAC

The CSIAC is a DoD-sponsored Center of Excellence in the fields of Cybersecurity, Software Engineering, Modeling & Simulation, and Knowledge Management & Information Sharing.Learn More

Contact Us

Phone:800-214-7921
Email:info@csiac.org
Address:   266 Genesee St.
Utica, NY 13502
Send us a Message
US Department of Defense Logo USD(R&E) Logo DTIC Logo DoD IACs Logo

Copyright 2012-2021, Quanterion Solutions Incorporated

Sitemap | Privacy Policy | Terms of Use | Accessibility Information
Accessibility / Section 508 | FOIA | Link Disclaimer | No Fear Act | Policy Memoranda | Privacy, Security & Copyright | Recovery Act | USA.Gov

This website uses cookies to provide our services and to improve your experience. By using this site, you consent to the use of our cookies. To read more about the use of our site, please click "Read More". Otherwise, click "Dismiss" to hide this notice. Dismiss Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.