Unix Tutorials For Beginners - Part II
Editors and Operations with File Content
23rd Friday 2001

 
 
 
 
 

             by Christine Vogel, I Yr PhD student with Dr. Cyrus Chothia
                              Structural Studies Division
                       MRC-Laboratory of Molecular Biology
                 http://www.mrc-lmb.cam.ac.uk/genomes/cvogel/edtut.htm


 




1. EDITORS
==========
1. word - known from windows
2. xemacs - nice and pretty close to word
3. pico - good if you know pine and want some help
4. vi - for sadomasochists...


REMEMBER: If you want to look at a MS Word document in  unix
(more, xemacs, pico,...) you need to save it in .txt format first!!!
(Otherwise you get lots of funny characters messing the whole thing up...)




1.1. XEMACS
-----------
can use mouse and icons
can specifiy macros (go to Edit, Start Macro Recording)


ctrl-s search
ctrl-s  search same pattern
ctrl-w cut
ctrl-a  beginning of line
ctrl-e end of line
ctrl-space set mark
ctrl-u number  do x times
ctrl-x-e  execute


e.g. ctrl-u-1000-ctrl-x-e  do 1000 times
example:
xemacs .cshrc
alias xe xemacs
save


1.2. PICO
--------
ctrl-o  safe file
ctrl-x exit pico
example:
pico .cshrc
alias p pico


1. 3. VI
------
advantage: universal and most powerful




        2. FILE OPERATIONS
===================
Imagine you have downloaded a list of you favourite proteins, including their
lengths and other information... (file: protein.out)

generally: "options" modify the command you use, they often start with
an -. e.g.
 

nice:   use right mouse button for cut-and-paste,tab to complete
filenames automatically and arrows to go back to previous commands



helpman COMMAND
          apropos WORD
          lmbhelp COMMAND



basics:  a file starts to exist as soon as you mention its name
  cp FILE                                                                                      copy file
  rm FILE                                                                                     remove file
  mv FILE FILE2                                                                         rename/move file
  lpr FILE                                                                                     print file (make sure you know where the printer is...)
  chmod +x FILE                                                                         make file executable


AND:  There is logic behind UNIX and LINUX commands, so there is a
way to understand it... :-)



cut
cuts things out from files
------------------
example: cut-c1-20 FILE
cut [options] [files]
specify -c or -f. separate values by comma. hyphen (-) a range.

options:
-cx,y   cut columns x and y
-fx,y   cut fields identified in x,y
-dc     use with -f to specify field delimiter as character c (default is tab); special
            characters must be quoted
-s      use with -f to suppress lines without delimiters
examples:
cut -d: -f1,5 filename
cut -c4 filename | paster - filename    cut from 4. column and paste back as first column
in the same file
eg: %> cut -c2-5 filename
this means: cut column numbers from 2 to 5 (all inclusive) from the file filename.
eg: %> cut -f3-4 filename
if the filename has field delimiters, then individual fields can be cut out using the -f option.

Point to remember:

A file can have many fields and the basic unit of a field is a column. Fields are a set
of columns which are delimited with a special symbol.
madan;SS;MRC-LMB
christine;SS;MRC-LMB
This particular examples has 3 fields which are 'delimited' by a ;
so to get field number three, you should type
eg: %> cut -f4 -d';' filename


join [options] file1 file2
Merge different columns into a database.
----------------------------------
Join the common lines of SORTED file1 and SORTED file2.
Output: common field and the remainder of each line from 1 and 2.
Options:
-ax   list unpairable lines in file x
-ex   replace any empty ouitput field with the string x
-jx y join on the xth filed of file y
-o x.y each output line contains fields specified byt file number n and field number m.
common fields are suppressed.
-tc use c as field separator for input and output

paste
used to paste 2 files laterally
------------------------
eg: %> paste file1 file2 > pastedfile
eg: %> paste -d';' file1 file2 > pastedfile
the last command will paste two files delimited by a ';'

cat
concatenates files
--------
eg: cat file1 file2 >> file1and2.out

more
show file content
--------
head
--------
tail
both similar to more: beginning/end of file
--------
example:
more protein.out
head protein.out
tail -200 protein.out

grep -options 'WHAT' FILE
greps lines with WHAT in there
--------------------------
grep 'CT1' protein.out
grep 'CT1%' protein.out |wc
grep 'CT1%' protein.out |m
grep [options] regexpr [files]
Can be done with directories and their filenames in there!
options:

-b numbered lines
-c count matches
-i ignore upper/lower case
-n print lines and their numbers
-v list all NOT matching lines
-l filenames but not matched lines
-h inverse of -l


tr
used to translate alphanumeric characters
-----------------------------------------
This command is used to translate one character to another. If you want all upper
case to be translated to lower case, then you have to type the following:
eg: %> tr A-Z a-z < filename
eg: %> cut -f2 -d';' filename | tr a-z A-Z
the last command will cut field number 2 from the file 'filename' which has fields
delimited by a ';'. The standard output is given to the translate command which converts all the lower case to upper case letters.




strings [options] [files]
search binary files for text patterns
--------------------------------
options:
-a search entire file
-n x      minimum string length of x

wc [options] [files]
count lines, words, and characters
--------------------
-c characters only
-l lines only
-w words only

sort [options] [files]
does what it promises
---------------------


-b  ignore leading spaces and tabs
-f  ignore upper/lower case
-i  ignore non-printing characters
-m  merge sorted input files
-n  sort in arithmetic order
-ox put output in file x
-r  reverse order
-tc  separate fields with c
-u  identical lines in input file appear only one time in output (unique)

    eg: %> sort filename
    eg: %> sort -n filename (numerically)
    eg: %> sort +1 -n filename (numerically based on the second column. Remember the first column is 0)

uniq 
          gives unique entries of file OR makes lists nonredundant
--------
make sure you SORT file before you do uniq!!!
example:
sort protein.out|uniq
protein.out|wc
sort protein.out|uniq|wc
uniq [options] [file1] [file2]

-c  count instances
-d  print only duplicates once, not unique lines
-u print only unique lines (no copy of duplicate entries kept)
-number  ignore first "number" fields of a line (!!!!)
+number  ignore first "number" characters of a field


diff FILE1 FILE2
tells difference between files
----------------


comm FILE1 FILE2
same (nicer?).output: 1 2 both
----------------
example:
comm protein.out protein_old.out
comm -3 protein.out protein_old.out


cmp [options] file1 file2
compare two files, gives
--------------------------
0 identical
1 different
2 inaccessible
options:
-l, tell each difference, -s work silently
example: cmp -s odl new && echo 'no changes'


split FILE
look up options
----------
This can be used to split files at regular pattern.
csplit -k -f ecoli. ecolik12.gbk '/    gene   /' '{5000}'
This means that you want to split the file ecolik12.gbk in to
5000 files starting with ecoli. and it will proceed as ecoli.000,
ecoli.001,.... ecoli.4999. Teh files will be split whenever it
encounters the pattern "    gene   "


LAST:
-----
Just play a bit.
echo 'hello' > myfirstfile.out
pico myfirstfile.out
more protein.out
cut -c1-10 protein.out
cut -c1-10 protein.out | more
cut -c1-10 protein.out | wc
cut -c1-10 protein.out | sort | uniq |wc