Username:
Password:
Linux Professional Institute: Level 2 Tutorials

Text & Documentation / Process Text Streams Using Filters

Subtitles of the Movie

Exam objective 1.103.2 has a weight of 6, and verifies that candidates are able to apply filters to text. You should be able to send text through filters using standard Unix commands from the Gnu package. Probably the simplest text filter is CAT, which will read a file or standard input and write it to standard output. This is the most common use of CAT. Give it a file name, and have the contents of the file listed to the display. Here's one you may not know. TAC is CAT spelled backwards, and it does the same thing CAT does, but it lists the file backwards, the last line first. WC stands for Word Count. It doesn't list the file. It just counts the lines, words, and characters in the file and lists the counts. The NL command, number line command, it lists the file, but it sticks a line number on the front of each line. Sometimes you just want to see how a file starts. The Head command just lists the first 10 lines. Actually there are some arguments you can give it and change how much it displays. The Tail command displays only the last 10 lines. The PR command formats text for print. It will, by default, cause the output to be the size of a printed page, so I have to pipe it through more so you can see the heading it produces. This is the default heading. It has arguments that allow you to change the heading. You can set the page height, spacing, line width; all sorts of things. This is what I use. This command adds three blank spaces to the front of each line, because my printer chops off the beginning of each line of text. I don't know why, but this does fix it, and it puts a header on it too. The Sort command puts the lines of text in sorted order. This example just uses alphabetical order on the beginning of each line, but there are lots of options. You can sort things just about any way you like. The Hex Dump command does just about what you would expect. It lists the hexadecimal values of the bytes of the file. It can be any file, it doesn't need to be ASCII, but you can have it display the characters, too. Also, it can display the values in octal and other formats. The HD command is a symbolic link to Hex Dump, and it uses a different set of defaults in formatting the display. Here you see both the hexadecimal and character representations of the text. The Cut command can be used to extract portions of lines of text. This is most often used inside a script to pull out a piece of a string. You can have it locate text by delimiters like tabs or slashes, or by position. This example extracts and displays characters 5 through 8 from each line. You can also have it do some simple search things, like print all the lines with no delimiters. TR is short for Translate. It will translate every character shown here in the first quoted string into the character in the corresponding position in the second string. Now, the TR command doesn't read a file. It only gets its input from Standard End. You can see that in this example, every E was translated into an X. The translate command has its own set of regular expressions, so you can match and translate all sorts of things. The FMT command can be used to change the format of text in several different ways. This command will reformat the text to have no lines longer than 40 characters. You can use other options to adjust indention and word spacing, an to have it work only with lines containing certain words. The Paste command puts files together side-by-side. If you name the same file twice, as in this example, it places a copy of each line of the file next to itself. If you name more than two files, it puts them altogether side-by-side with tabs to separate them. Now, look at these two text files. Notice that what they have in common is the name Herbert at the front of two lines. This Join command can be used to combine those two lines. This is the action of the Join command with no options specified. It uses the first field as a key. Any two lines with identical keys have their contents combined on a new line with the key at its front. It's sort of a more picky version of Paste. Now, if you look at this file, it has some duplicates, but you can list it without duplicates using the Unique command. Now the Unique command has some options like inserting the count of the number of occurrences in front of each line. There are commands named Expand and Unexpand. They don't make the text look any different. Expand converts spaces to tabs, and Unexpand converts tabs to spaces, and there is a Split command that will divide text into multiple files. The default size is 1000 lines each. SED is the stream editor. It's almost always used inside a script. You write SED rules and pass your text through them. It has so many options and operators it can almost be considered a programming language. You need to know these commands. Once you do, you'll find yourself using them. You need to familiarize yourself with their options.

Tutorial Information

Course: Linux Professional Institute: Level 2
Author: Arthur Griffith
SKU: 33894
ISBN: 1-934743-79-8
Release Date: 2008-07-21
Duration: 7.5 hrs / 113 lessons
Captions: Available on CD and Online University
Compatibility: Vista/XP/2000, OS X, Linux
QuickTime 7, Flash 8

VTC Sign up & Benefits

  • Unlimited Access
  • 81,350 Video Tutorials (20,800 free)
  • Video Available as Flash or QuickTime
  • Over 782 Courses
  • $30 for One Month Access
  • Multi-User Discounts Available