Efficiently Browsing Text or Code
Searching a large code base or text dumps for sections of interest can be frustrating. This article walks you through using Emacs to make the task easy. Sometimes grep just isn't enough.
Let's search the source code of Python 2.7.2 for something interesting. You may find yourself in similar circumstances for reasons like:
- You inherited an enormous project and you have to fix an elusive bug or ten. The website is down and your company is hemorrhaging money while you mainline coffee and search the code.
- You are wading through thousands of emails to locate one important email address. The fate of a million dollar deal hangs in balance. All you know is that the person said 'phone' and 'sergei'.
- You decide to scan all of popular classical literature for a certain pattern of word usage, to make an important point in your thesis.
- You are a lawyer in the discovery stage of a case and you need to know why anyone involved ever wrote 'ouch', 'shred' or 'insider trading' in the last 10 years.
- You just need to wade through a lot of text.
First let's take look at how you'd accomplish this without using Emacs and then work our way up to a full solution.
You are interested in the keyword 'tty'. You think this is a good place to start your investigation. So you list the lines of code that contain the word 'tty'. So you type:
grep -r "tty" *
Now you see all the lines that contain 'tty' but nothing else. That doesn't tell you a lot. You want to see those lines with some context. So you type:
grep -r "tty" -5 *
This displays 5 lines surrounding the lines that contain 'tty'. Now you see how tty is used in each file.
When you are scrolling down, scanning the output, you see something that interests you and you decide to take a closer look at that file. Now you have to make a note of the file name in the output and open it. Not that convenient but still doable. As you do this multiple times it quickly gets frustrating.
You see that tty occurs in thousands of places. You decide that you are not going to look at every single occurrence of it. You decide to narrow your search to files that contain both 'tty' and 'ioctl', possibly on different lines. So you run:
#!/usr/bin/env bash find . | while read -r f do grep -q "tty" "$f" && grep -q "ioctl" "$f" && ls -lR "$f" done
This narrows things down and you get a list of files that contain both the keywords.
You want to view all these files, so you run:
#!/usr/bin/env bash find . | while read -r f do grep -q "tty" "$f" && grep -q "ioctl" "$f" && cat "$f" >> tmp.txt done
and view the file tmp.txt. This may be okay but if you concatenate source files containing programs written in different languages, you can't get good syntax highlighting while reading the file. If you copied just the files of interest to a directory, viewing each file can be a hassle.
You run open eshell with
M-x eshell and type:
egrep -r "tty" -5 *
You can then scan the output and view just the files that interest you, by pressing 'Enter' with your cursor on the output line that caught your attention. Emacs will place your cursor in the line that you are interested in.
This becomes even easier when you split your Emacs window vertically into two buffers with
C-x 3. You quickly move to different lines of interest in the eshell buffer and press enter to view the file in the other buffer.
You decide to use the same method (shell script) used earlier to narrow the search-space by only looking at files that have both the keywords you are looking for. So you run this:
#!/usr/bin/env bash find . | while read -r f do grep -q "tty" "$f" && grep -q "ioctl" "$f" && \ ls -lR "$f" >> /tmp/listing.txt done
The output of this command looks awfully similar to
dired. dired makes viewing and otherwise manipulating files incredibly easy. You already use dired+ as that gives you several neat additional features. Won't it be nice if you can create a custom dired buffer with the output of the shell command you ran above?
You do exactly this with virtual-dired. You capture the output of the shell command in a file and open it in Emacs and run
M-x virtual-dired. Now you have a custom dired buffer.
Now you can quickly view the files.
A Pinch of Elisp
Your search narrowed the file list to just 11. For a different search, even with such filtering, you can end up with 40 or more files to look at. Even with dired+ giving you a nice interface, you'll quickly get tired of viewing whole files and searching through it for the keywords of interest. For one, you have to type in the search string on opening each file.
So you decide to write some elisp to make it easy.
(key-chord-define-global "fo" 'occur-kw) ;(global-set-key [f3] 'occur-kw) (defun occur-kw () (interactive) (occur "tty\\|ioctl" 5) ;(switch-to-buffer "*Occur*") (other-window 1) )
Now your work-flow is: You press
enter on a line in the dired+ buffer, which opens the file in the other buffer. Then you press the keys 'f' and 'o' together to narrow the displayed lines to just the lines that interest you. If you are not comfortable with using key chords, you'll just map the function to a different key instead. In the above example, it is mapped to
'occur' called with argument 5 shows five lines of context around the lines of interest. In this case, the lines of interest are the lines with the words 'tty' or 'ioctl'.
This makes it a lot easier. However there is still one annoyance left. After you are done scanning a file, you need to switch back to the dired buffer. Often it won't be next in the buffer list as you would have moved to different buffers to take notes etc. So you can't just
C-x b back to it. You solve this with:
(global-set-key [f4] (lambda () (interactive) (switch-to-buffer "listing.txt") ))
A Smidgen of Macros
You pamper yourself by tying the steps together with an Emacs keyboard macro.
You record a macro of you doing these steps:
C-x ( (start recording macro)
f4- takes you to the virtual-dired buffer
C-n- takes you to the next line, which is the next file to be reviewed
enter- opens the next file
f3- narrows the buffer to only the lines that interest you
C-x ) (stop recording macro)
You assign this macro to
Your work-flow now consists of you just hitting
Whenever you have to perform a repetitive task, automating even the smallest step really helps. In addition to saving time and reducing your clients' expenses, the tools you build decrease tedium and thereby reduce errors caused by frustration and fatigue. Once built, the tools become a part of your Batman utility belt. Emacs lends itself well to tool-building and customizations that can make you extremely productive.
If you like this article, please link to it.