• Home
  • Resources
    • Find Resources by Topic Tags
    • Cybersecurity Policy Chart
    • CSIAC Reports
    • Webinars
    • Podcasts
    • Cybersecurity Digest
    • Standards & Reference Docs
    • Journals
    • Certifications
    • Acronym DB
    • Cybersecurity Related Websites
  • Services
    • Free Technical Inquiry
    • Core Analysis Task (CAT) Program
    • Subject Matter Expert (SME) Network
    • Training
    • Contact Us
  • Community
    • Upcoming Events
    • Cybersecurity
    • Modeling & Simulation
    • Knowledge Management
    • Software Engineering
  • About
    • About the CSIAC
    • The CSIAC Team
    • Subject Matter Expert (SME) Support
    • DTIC’s IAC Program
    • DTIC’s R&E Gateway
    • DTIC STI Program
    • FAQs
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Login / Register

CSIAC

Cyber Security and Information Systems Information Analysis Center

  • Resources
    • Find Resources by Topic Tags
    • Cybersecurity Policy Chart
    • CSIAC Reports
    • Webinars
    • Podcasts
    • Cybersecurity Digest
    • Standards & Reference Docs
    • Journals
    • Certifications
    • Acronym DB
    • Cybersecurity Websites
  • Services
    • Free Technical Inquiry
    • Core Analysis Task (CAT) Program
    • Subject Matter Expert (SME) Network
    • Training
    • Contact
  • Community
    • Upcoming Events
    • Cybersecurity
    • Modeling & Simulation
    • Knowledge Management
    • Software Engineering
  • About
    • About the CSIAC
    • The CSIAC Team
    • Subject Matter Expert (SME) Support
    • DTIC’s IAC Program
    • DTIC’s R&E Gateway
    • DTIC STI Program
    • FAQs
  • Cybersecurity
  • Modeling & Simulation
  • Knowledge Management
  • Software Engineering
/ Journal Issues / Games People Play Behavior and Security / Cyber Profiling: Using Instant Messaging Author Writeprints for Cybercrime Investigations

Cyber Profiling: Using Instant Messaging Author Writeprints for Cybercrime Investigations

Published in Journal of Cyber Security and Information Systems
Volume: 2 Number: 2 - Games People Play Behavior and Security

Authors: Dr. Angela Orebaugh, Dr. Jason Kinser and Dr. Jeremy Allnutt
Posted: 02/09/2016 | Leave a Comment

The explosive growth in the use of instant messaging (IM) communication in both personal and professional environments has resulted in an increased risk to proprietary, sensitive, and personal information and safety due to the influx of IM-assisted cybercrimes, such as phishing, social engineering, threatening, cyber bullying, hate speech and crimes, child exploitation, sexual harassment, and illegal sales and distribution of software. IM-assisted cybercrimes are continuing to make the news with child exploitation, cyber bullying, and scamming leading last month’s headlines. Instant messaging’s anonymity and use of virtual identities hinders social accountability and presents a critical challenge for cybercrime investigation. Cyber forensic techniques are needed to assist cybercrime decision support tools in collecting and analyzing digital evidence, discovering characteristics about the cyber criminal, and assisting in identifying cyber criminal suspects.

Introduction

The anonymous nature of the Internet allows online criminals to use virtual identities to hide their true identity to facilitate cybercrimes. Although central IM servers authenticate users upon login, there is no means of authenticating or validating peers (buddies). Current IM products are not addressing the anonymity and ease of impersonation over instant messaging. Author writeprints can provide cybercrime investigators a unique tool for analyzing IM-assisted cybercrimes. Writeprints are based on behavioral biometrics, which are persistent personal traits and patterns of behavior that may be collected and analyzed to aid a cybercrime investigation. (Li et al., 2006) Instant messaging behavioral biometrics include online writing habits, known as stylometric features, which may be used to create an author writeprint to assist in identifying an author, or characteristics of an author, of a set of instant messages. The writeprint is a digital fingerprint that represents an author’s distinguishing stylometric features that occur in his/her computer-mediated communications. Writeprints may be used as input to a criminal cyberprofile and as an element of a multimodal system for cybercrime investigations. Writeprints can be used in conjunction with other evidence, criminal investigation techniques, and biometrics techniques to reduce the potential suspect space to a certain subset of suspects; identify the most plausible author of an IM conversation from a group of suspects; link related crimes; develop an interview and interrogation strategy; and gather convincing digital evidence to justify search and seizure and provide probable cause.

Instant Messaging and Cybercrime

Instant messaging’s anonymity hinders social accountability and leads to IM-assisted cybercrime facilitated by the following:

  • Users can create any virtual identity,
  • Users can log in from anywhere,
  • Files can be transmitted, and
  • Communication is often transmitted unencrypted.

In IM communications, criminals use virtual identities to hide their true identity. They can use multiple screen names or impersonate other users with the intention of harassing or deceiving unsuspecting victims. Criminals may also supply false information on their virtual identities, for example a male user may configure his virtual identity to appear as female. Since most IM systems use the public Internet, the risk is high that usernames and passwords may be intercepted, or an attacker may hijack a connection or launch a man-in-the-middle (MITM) attack. With hijacking and MITM attacks, the victim user thinks he/she is communicating with a buddy but is really communicating with the attacker masquerading as the victim’s buddy. Instant messaging’s anonymity allows cyber criminals such as pedophiles, scam artists, and stalkers to make contact with their victims and get to know those they target for their crimes (Cross, 2008). IM-assisted cybercrimes, such as phishing, social engineering, threatening, cyber bullying, hate speech and crimes, child exploitation, sexual harassment, and illegal sales and distribution of software are continuing to increase (Moores and Dhillon, 2000). Additionally, criminals such as terrorist groups, gangs, and cyber intruders use IM to communicate (Abbasi and Chen, 2005). Criminals also use IM to transmit worms, viruses, Trojan horses, and other malware over the Internet.

With increasing IM cybercrime, there is a growing need for techniques to assist in identifying online criminal suspects as part of the criminal investigation. Cyber forensics is the application of investigation and analysis techniques to gather evidence suitable for presentation in a court of law with the goal of discovering the crime that took place and who was responsible (Bassett et al., 2006). With IM communications, it is necessary to have cyber forensics techniques to assist in determining the IM user’s real identity and collect digital evidence for investigators and law enforcement.

Behavioral Biometrics Writeprints for Authorship Analysis

Determining an IM user’s real identity relies on the fact that humans are creatures of habit and have certain persistent personal traits and patterns of behavior, known as behavioral biometrics (Revett, 2008). Online writing habits, known as stylometric features, include composition syntax and layout, vocabulary patterns, unique language usage, and other stylistic traits. Thus, certain stylometric features may be used to create an author writeprint to help identify an author of a particular piece of work (De Vel et al., 2001). A writeprint represents an author’s distinguishing stylometric features that occur in his/her instant messaging communications. These stylometric features may include average word length, use of punctuation and special characters, use of abbreviations, and other stylistic traits. Writeprints can provide cybercrime investigators a unique behavioral biometric tool for analyzing IM-assisted cybercrimes. Writeprints can be used as input to a criminal cyberprofile and as an element of a multimodal system to perform cyber forensics and cybercrime investigations.

Instant messaging communications contain several stylometric features for authorship analysis research. Certain IM specific features such as message structure, unusual language usage, and special stylistic markers are useful in forming a suitable writeprint feature set for authorship analysis (Zheng et al., 2006). The style of IM messages is very different than that of any other text used in traditional literature or other forms of computer-mediated communication. The real time, casual nature of IM messages produces text that is conversational in style and reflects the author’s true writing style and vocabulary (Kucukyilmaz et al., 2008). Significant characteristics of IM are the use of special linguistic elements such as abbreviations, and computer and Internet terms, known as netlingo. The textual nature of IM also creates a need to exhibit emotions. Emotion icons, called emoticons, are sequences of punctuation marks commonly used to represent feelings within computer-mediated text (Kucukyilmaz et al., 2008). An author’s IM writeprint may be derived from network packet captures or application data logged during an instant messaging conversation. Although some types of digital evidence, such as source IP addresses, file timestamps, and metadata may be easily manipulated, author writeprints based on behavioral biometrics are unique to an individual and difficult to imitate.

Creating IM Writeprints

A stylometric feature set is composed of a predefined set of measurable writing style attributes. Given t predefined features, each set of IM messages for a given author can be represented as a t-dimensional vector, called a writeprint. Figure 1 presents a stylometric feature set for a 356-dimensional vector writeprint with lexical, syntactic, and structural features. (Orebaugh et al., 2014) The number of features in each category is shown in parenthesis.

Lexical features mainly consist of count totals and are further broken down into emoticons, abbreviations, word-based, and character-based features. Syntactic features include punctuation and function words in order to capture an author’s habits of organizing sentences. Function words include conjunctions, prepositions, and other words that carry little meaning when used alone, such as “the” or “of”. They provide relationships to content words in the sentence, such as “ball” or “bounce”. Analyzing function words as opposed to content words allows topic-independent results that reflect an author’s preferred ways to express himself or herself and form sentences. Structural features capture the way an author organizes the layout of text. With IM communications there are no standard headers, greetings, farewells, or signatures, leaving simply the average characters and words per message in terms of structural layout. A list of function words, abbreviations, and emoticons are included in Appendix A.

fig1

Writeprints are created by generating totals for each stylometric feature, resulting in the output of a writeprint (Wx) for a set of messages {M1,…,Mp} for an author (An) or author category (Cm). A writeprint may be viewed in a comma-separated value (CSV) format where each value represents a total for a specific feature. An example writeprint for an author An using a selected feature set {F1,…,Fq}, where q =100, for a set of messages {M1,…,Mp} looks like the following:

cybnum

After writeprints are generated they may then be normalized, standardized, and input into various statistical models for analysis. Figure 2 shows the output of the Principal Component Analysis (PCA) model for writeprints for seven authors. (Orebaugh et al., 2014) The figure shows the first 3 principal components for multiple author conversations, mapped in three-dimensional space. In this example, each author has a relatively well-defined cluster representing his or her writeprint. Different authors separate from each other, while multiple conversations of an author cluster together. This type of example may be used in an investigation to show that sample evidentiary writeprints do or do not overlap with certain suspect writperints, thus helping investigators narrow the suspect space, develop an interrogation strategy, link related crimes, or justify probable cause.

top7_fmt

Figure 2. IM Writeprint PCA Output

Pages: Page 1 Page 2

Previous Article:
« BECO: Behavioral Economics of Cyberspace Operations
Next Article:
A Knowledge Management (KM) Primer »

References

1. Cross, Michael. Scene of the Cybercrime. Syngress Publishing, (2008): 679-690

2. Moores, Trevor, and Gurpreet Dhillon. “Software piracy: a view from Hong Kong.” Communications of the ACM 43.12 (2000): 88-93.

3. Abbasi, Ahmed, and Hsinchun Chen. “Applying authorship analysis to extremist-group web forum messages.” Intelligent Systems, IEEE 20.5 (2005): 67-75.

4. Bassett, Richard, Linda Bass, and Paul O’Brien. “Computer forensics: An essential ingredient for cyber security.” Journal of Information Science and Technology 3.1 (2006): 22-32.

5. Revett, Kenneth. Behavioral biometrics: a remote access approach. Wiley Publishing, (2008): 1-2.

6. De Vel, Olivier, Alison Anderson, Malcolm Corney, and George Mohay. “Mining e-mail content for author identification forensics.” ACM Sigmod Record 30.4 (2001): 55-64.

7. Zheng, Rong, Jiexun Li, Hsinchun Chen, and Zan Huang. “A framework for authorship identification of online messages: Writing‐style features and classification techniques.” Journal of the American Society for Information Science and Technology 57.3 (2006): 378-393.

8. Kucukyilmaz, Tayfun, B. Cambazoglu, Cevdet Aykanat, and Fazli Can. “Chat mining: Predicting user and message attributes in computer-mediated communication.” Information Processing & Management 44.4 (2008): 1448-1466.

9. Leafe, David. “Dear Garry. I’ve decided to end it all: The full stop that trapped a killer.” Daily Mail (2009).

10. Casey, E. “Cyberpatterns: criminal behavior on the Internet.” Criminal profiling: An introduction to behavioral evidence analysis (1999): 361-378.

11. Federal Bureau of Investigation, Behavioral Science Unit website. http://www.fbi.gov/hq/td/academy/bsu/bsu.htm (accessed March 4, 2014)

12. Doublas, John E., Robert K. Ressler, Ann W. Burgess, and Carol R. Hartman. “Criminal profiling from crime scene analysis.” Behavioral Sciences & the Law 4.4 (1986): 401-421.

13. Li, Jiexun, Rong Sheng, and Hsinchun Chen. “From Fingerprint to Writeprint.” Communications of the ACM 49.4 (2006): 76-82

14. Orebaugh, Angela, Jason Kinser, and Jeremy Allnutt. “Visualizing Instant Messaging Author Writeprints for Forensic Analysis,” In Proceedings of Conference on Digital Forensics, Security and Law, Richmond VA (2014): 191-213

cybapen

Authors

Dr. Angela Orebaugh
Dr. Angela Orebaugh
Dr. Angela Orebaugh is Fellow and Chief Scientist at Booz Allen Hamilton. She received her Ph.D. from George Mason University with a concentration in Information Security. Her current research interests include behavioral biometrics and the Internet of Things.
Dr. Jason Kinser
Dr. Jason Kinser
Dr. Jason Kinser is an Associate Professor in the School of Physics, Astronomy, and Computational Sciences at George Mason University. His current research interests include classification of regions in lung scans to detect idiopathic pulmonary fibrosis.
Dr. Jeremy Allnutt
Dr. Jeremy Allnutt
Dr. Jeremy Allnutt is a Professor in the Electrical and Computer Engineering Department at George Mason University with a focus in communications and signal processing, computer networking, and telecommunications.

Reader Interactions

Leave a Comment Cancel

You must be logged in to post a comment.

sidebar

Blog Sidebar

Featured Content

Data Privacy Day - Jan 28

Data Privacy Day is January 28th

You can help create a global community that respects privacy, safeguards data, and enables trust. You can help teach others about privacy at home, at work, and in your community.

Learn How

Featured Subject Matter Expert (SME): Daksha Bhasker

A dynamic CSIAC SME, Senior Principal Cybersecurity Architect, Daksha Bhasker has 20 years of experience in the telecommunications services provider industry. She has worked in systems security design and architecture in production environments of carriers, often leading multidisciplinary teams for cybersecurity integration, from conception to delivery of complex technical solutions. As a CSIAC SME, Daksha's contributions include several published CSIAC Journal articles and a webinar presentation on the sophiscated architectures that phone carriers use to stop robocalls.

View SME's Contributed Content

The DoD Cybersecurity Policy Chart

The DoD Cybersecurity Policy Chart

This chart captures the tremendous breadth of applicable policies, some of which many cybersecurity professionals may not even be aware, in a helpful organizational scheme.

View the Policy Chart

CSIAC Report - Smart Cities, Smart Bases and Secure Cloud Architecture for Resiliency by Design

Integration of Smart City Technologies to create Smart Bases for DoD will require due diligence with respect to the security of the data produced by Internet of Things (IOT) and Industrial Internet of Things (IIOT). This will increase more so with the rollout of 5G and increased automation "at the edge". Commercially, data will be moving to the cloud first, and then stored for process improvement analysis by end-users. As such, implementation of Secure Cloud Architectures is a must. This report provides some use cases and a description of a risk based approach to cloud data security. Clear understanding, adaptation, and implementation of a secure cloud framework will provide the military the means to make progress in becoming a smart military.

Read the Report

CSIAC Journal - Data-Centric Environment: Rise of Internet-Based Modern Warfare “iWar”

CSIAC Journal Cover Volume 7 Number 4

This journal addresses a collection of modern security concerns that range from social media attacks and internet-connected devices to a hypothetical defense strategy for private sector entities.

Read the Journal

CSIAC Journal M&S Special Edition - M&S Applied Across Broad Spectrum Defense and Federal Endeavors

CSIAC Journal Cover Volume 7 Number 3

This Special Edition of the CSIAC Journal highlights a broad array of modeling and simulation contributions – whether in training, testing, experimentation, research, engineering, or other endeavors.

Read the Journal

CSIAC Journal - Resilient Industrial Control Systems (ICS) & Cyber Physical Systems (CPS)

CSIAC Journal Cover Volume 7 Number 2

This edition of the CSIAC Journal focuses on the topic of cybersecurity of Cyber-Physical Systems (CPS), particularly those that make up Critical Infrastructure (CI).

Read the Journal

Recent Video Podcasts

  • Privacy Impact Assessment: The Foundation for Managing Privacy Risk Series: The CSIAC Podcast
  • Agile Condor: Supercomputing at the Edge for Intelligent Analytics Series: CSIAC Webinars
  • Securing the Supply Chain: A Hybrid Approach to Effective SCRM Policies and Procedures Series: The CSIAC Podcast
  • DoD Vulnerability Disclosure Program (VDP) Series: CSIAC Webinars
  • 5 Best Practices for a Secure Infrastructure Series: The CSIAC Podcast
View all Podcasts

Upcoming Events

Tue 19

SANS Stay Sharp: Blue Team Operations 2021

January 18 - January 20
Organizer: SANS Institute
Tue 19

SANS Cyber Security Central: Jan 2021

January 18 - January 23
Organizer: SANS Institute
Tue 19

AI Champions, Online – Supply Chain

January 19 @ 14:00 - January 21 @ 15:30 EST
Thu 21

SANS Cyber Threat Intelligence Summit 2021

January 21 - January 22
Organizer: SANS Institute
Fri 22

SANS Cyber Threat Intelligence Solutions Track 2021

January 22 @ 09:00 - 17:00 EST
Organizer: SANS Institute
View all Events

Footer

CSIAC Products & Services

  • Free Technical Inquiry
  • Core Analysis Tasks (CATs)
  • Resources
  • Events Calendar
  • Frequently Asked Questions
  • Product Feedback Form

About CSIAC

The CSIAC is a DoD-sponsored Center of Excellence in the fields of Cybersecurity, Software Engineering, Modeling & Simulation, and Knowledge Management & Information Sharing.Learn More

Contact Us

Phone:800-214-7921
Email:info@csiac.org
Address:   266 Genesee St.
Utica, NY 13502
Send us a Message
US Department of Defense Logo USD(R&E) Logo DTIC Logo DoD IACs Logo

Copyright 2012-2021, Quanterion Solutions Incorporated

Sitemap | Privacy Policy | Terms of Use | Accessibility Information
Accessibility / Section 508 | FOIA | Link Disclaimer | No Fear Act | Policy Memoranda | Privacy, Security & Copyright | Recovery Act | USA.Gov

This website uses cookies to provide our services and to improve your experience. By using this site, you consent to the use of our cookies. To read more about the use of our site, please click "Read More". Otherwise, click "Dismiss" to hide this notice. Dismiss Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.