SANS Internet Storm Center, InfoCON: green: Detecting Random – Finding Algorithmically chosen DNS names (DGA), (Thu, Jul 9th)

Most normal user traffic communicates via a hostname and not an IP address. Solooking at traffic communicating directly by IP with no associated DNS request is a good thing do to. Some attackers use DNS names for their communications. There is alsomalware such as Skybot and the Styx exploit kit that use algorithmically chosen host name rather than IP addresses for their command and control channels. This malware uses what has been called DGA or Domain Generation Algorithms to create random lookinghost names for its TLS command and control channel or to digitally sign its SSL certificates. These do not look like normal host names. A human being can easily pick them out of our logs and traffic, but it turns out to be a somewhat challenging thing to do in an automated process. Natural Language Processing or measuring the randomness dont seem to work very well. Here is a video that illustrates the problem and one possible approach to solving it.

One way you might try to solve this is with a tool called ent. ent a great Linux tool for detecting entropy in files.">Entropy = 7.999982 bits per byte."> --">[~]$ python -c print A*1000000 | ent
Entropy = 0.000021 bits per byte. -- 0 = not random

So 8 is highly random and 0 is not random at all.">[~]$ echo google | ent
Entropy = 2.235926 bits per byte.
[~]$ echo clearing-house | ent
Entropy = 3.773557 bits per byte. - Valid hosts are in the 2 to 4 range

Google scores 2.23 and clearing-house scores 3.7. So it appears as thoughlegitimate host names willbe in the 2 to 4 range.">[~]$ echo e6nbbzucq2zrhzqzf | ent
Entropy = 3.503258 bits per byte.
[~]$ echo sdfe3454hhdf | ent
Entropy = 3.085055 bits per byte. - Malicious host from Skybot and Styx malware are in the same range as valid hosts

Thats no good. Known malicious host names are also in the 2 to 4 range. They score just about the same as normal host names. We need a different approach to this problem.

Normal readable English has some pairs of characters that appear more frequently than others. TH, QU and ER appear very frequently but other pairs like WZ appear very rarely. Specifically, there is approximately a 40% chance that a T will be followed by an H. There is approximately a 97% change that a Q will be followed by the letter U. There is approximately a 19% chance that E is followed by R. With regard to unlikely pairs, there is approximately a 0.004% chance that W will be followed by a Z. So here is the idea, lets analyze a bunch of text and figure out what normal looks like. Then measure the host names against the tables. Im making this script and a Windows executable version of this tool available to you to try it out. Let me know how it works. Here is a look at how to use the tool.

Step 1) You need a frequency table. I shared two of them in my github if you want to use them you can download them and skip to step 2.

1a) Create the table: Im creating a table called custom.freq.">C:\freqfreq.exe --create custom.freq

1b) You can optionally turn ON case sensitivity if you want the frequency table to count uppercase letters and lowercase letters separately. Without this option the tool will convert everything to lowercase before counting character pairs.">C:\freqfreq.exe -t custom.freq

1c) Next fill the frequency table with normal text. You might load it with known legitimate host names like the Alexa top 1 million most commonly accessed websites. (http://s3.amazonaws.com/alexa-static/top-1m.csv.zip) I will just load it up with famous works of literature.">C:\freqfor %i in (txtdocs\*.*) do freq.exe --normalfile %i custom.freq
C:\freqfreq.exe --normalfile txtdocs\center_earth custom.freq
C:\freqfreq.exe --normalfile txtdocs\defoe-robinson-103.txt custom.freq
C:\freqfreq.exe --normalfile txtdocs\dracula.txt custom.freq
C:\freqfreq.exe --normalfile txtdocs\freck10.txt custom.freq
C:\freq">

Step 2) Measure badness!

Once the frequency table is filled with data you can start to measure strings to see how probable they are according to our frequency tables.">C:\freqfreq.exe --measure google custom.freq
6.59612840648
C:\freqfreq.exe --measure clearing-house custom.freq
12.1836883765

So normal host names have a probability above 5 (at least these two and most others do). We will consider anything above 5 to be good for our tests.">C:\freqfreq.exe --measure asdfl213u1 custom.freq
3.15113061843
C:\freqfreq.exe --measure po24sf92cxlk">Our malicious hosts are less than 5. 5 seems to be a pretty good benchmark. In my testing it seems to work pretty well for picking out these abnormal host names. But it isnt perfect. Nothing is. One problem is that very small host names and acronyms that are not in the source files you use to build your frequency tables will be below 5. For example, fbi and cia both come up below 5 when I just use classic literature to build my frequency tables. But I am not limited to classic literature. That leads us to step 3.

Step 3) Tune for your organization.

The real power of frequency tables is when you tune it to match normal traffic for your network. --normal and --odd. --normal can be given a normal string and it will update the frequency table with that string. Both --normal and --odd can be used with the --weight option tocontrol how much influence the given string has on the probabilities in the frequency table. Its effectiveness is demonstrated by the accompanying youtube video. Note that marking random host names as --odd is not a good strategy. It simply injects noise into the frequency table. Like everything else in security identifying all the bad in the world is a losing proposition. Instead focus on learning normal and identifying anomalies. So passing --normal cia --weight 10000 adds 10000 counts of the pair ci and the pair ia to the frequency table and increases the probability of cia">C:\freqfreq.exe --normal cia --weight 10000 custom.freq

The source code and a Windows Executable version of this program can be downloaded from here:https://github.com/MarkBaggett/MarkBaggett/tree/master/freq

Tomorrow I in my diary I will show you some other cool things you can do with this approach and how you can incorporate this into your own tools.

Follow me on twitter @MarkBaggett

Want to learn to use this code in your own script or build tools of your own? Join me for PythonSEC573 in Las Vegas this September 14th! Click here for more information.

What do you think? Leave a comment.

(c) SANS Internet Storm Center. https://isc.sans.edu Creative Commons Attribution-Noncommercial 3.0 United States License.

SANS Internet Storm Center, InfoCON: green: Detecting Random – Finding Algorithmically chosen DNS names (DGA), (Thu, Jul 9th)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112