Text Processing & Terminal Editors

You’re running a simulation on a remote server. It generates a 500,000-line log file. You need to find the three lines where it errored, extract the timestamped values, and save them to a report. No Excel. No Jupyter. Just the terminal.

Redirection & Pipes

ls -la > file_list.txt        # Write output to file (overwrite)
ls -la >> file_list.txt       # Append
cat nonexistent 2> errors.txt # Redirect errors
cat file.txt 2>/dev/null      # Throw errors away
ls -la > all.txt 2>&1         # Both stdout and stderr

Pipe (|) chains commands:

ls -la | less
cat results.log | grep ERROR
ps aux | grep python
cat server.log | grep "ERROR" | sort | uniq -c | sort -rn | head -10

Key takeaway: The pipe is the most powerful tool in the shell. It lets you combine small, focused utilities into powerful data-processing pipelines.

grep — Find Lines That Match

grep "ERROR" simulation.log
grep -i "warning" simulation.log
grep -n "failed" run.log
grep -r "TODO" ./scripts/
grep -v "DEBUG" verbose.log
grep -c "success" results.log
grep -E "ERROR|WARN" app.log

Tip: grep -r is fantastic for searching codebases. Need to find where a function is defined or where a config value is used? grep -r "function_name" ./src/ will find it.

sort, uniq, wc, cut

sort names.txt
sort -n numbers.txt
sort -rn scores.txt
sort -k2 data.csv
uniq sorted.txt
uniq -c sorted.txt
sort data.txt | uniq -d
wc -l file.txt
wc -w file.txt
cut -d',' -f1,3 data.csv
cut -d' ' -f2- sentence.txt

sed — Stream Editor

sed 's/old/new/' file.txt
sed 's/old/new/g' file.txt
sed -i 's/localhost/192.168.1.10/g' config.ini
sed -n '10,20p' file.txt
sed '/^#/d' config.txt

awk — Structured Data Processing

awk '{print $1}' data.txt
awk -F',' '{print $2}' data.csv
awk '{sum += $3} END {print sum}' data.txt
awk '$3 > 100 {print $0}' data.txt
awk 'NR==5' file.txt

Getting Help

man grep
grep --help
tldr grep

Terminal Editors

nano

nano config.ini

Controls:

  • Ctrl+O — Save
  • Ctrl+X — Exit
  • Ctrl+W — Search
  • Ctrl+K — Cut line
  • Ctrl+U — Paste

vim

Vim has multiple modes: Normal, Insert, Visual, and Command.

i          Enter insert mode
Esc        Return to normal mode
:w         Save
:q         Quit
:wq        Save and quit
:q!        Quit without saving
/word      Search
dd         Delete line
yy         Copy line
p          Paste
u          Undo
gg         Beginning of file
G          End of file

Tip: Emergency vim exit: press Esc, then type :q! and press Enter. This quits without saving and will get you out no matter what state you’re in.

Try It Yourself

Process a log file from the command line.

First, create a file called simulation.log with the following content:

2026-04-01 08:00:01 INFO  Starting simulation run
2026-04-01 08:00:02 DEBUG Initializing parameters
2026-04-01 08:00:05 INFO  Loading dataset (5000 records)
2026-04-01 08:00:10 DEBUG Memory allocated: 2.4GB
2026-04-01 08:00:15 WARNING High memory usage detected
2026-04-01 08:00:20 INFO  Processing batch 1/10
2026-04-01 08:00:45 ERROR Numerical instability in solver at step 342
2026-04-01 08:00:46 INFO  Retrying with adjusted parameters
2026-04-01 08:01:10 WARNING Convergence slower than expected
2026-04-01 08:01:30 INFO  Simulation complete

Tasks:

  1. Count the total number of lines: wc -l simulation.log
  2. Find all ERROR lines: grep "ERROR" simulation.log
  3. Show everything except DEBUG lines: grep -v "DEBUG" simulation.log
  4. Extract timestamps from INFO lines: grep "INFO" simulation.log | cut -d' ' -f1,2
  5. Save all WARNING and ERROR lines to a new file called problems.log.
Solution

grep -E "WARNING|ERROR" simulation.log > problems.log

Quick Quiz

You have a CSV file called sensors.csv with columns timestamp,sensor_id,temperature. How do you extract just the temperature column?

  • A) awk '{print $3}' sensors.csv
  • B) cut -d',' -f3 sensors.csv
  • C) grep "temperature" sensors.csv
  • D) sed 's/temperature//' sensors.csv
Answer

B) cut -d',' -f3 sensors.csv — The -d',' flag sets the delimiter to a comma, and -f3 extracts the third field, which is the temperature column.