Date: Thu, 20 Jul 1995 11:07:15 -0500 (CDT) From: Reece Kimball Hart To: chemistry@ccl.net Subject: ANNOUNCE: automated PDB retrieval This is an announcement for getpdb, a ksh script which automates incremental mirroring of the Protein Data Bank. It requires only standard Unix utilities (ftp, sed, nawk, cut, zcat). The script contains much more detail about features, usage, and requirements. Comments are welcome. getpdb may be obtained from: http://dasher.wustl.edu/~reece/src/getpdb ftp://dasher.wustl.edu/pub/getpdb/ -- Reece Reece Kimball Hart | email: reece@dasher.wustl.edu Biophysics & Biochemistry, Box 8231 | WWW: http://dasher.wustl.edu/~reece/ Washington Univ. School of Medicine | Phone: (314) 362-4198 (lab) 660 South Euclid | -7183 (fax) St. Louis, Missouri 63110 (USA) | PGP public key available by finger & WWW ------------------ ORIGINAL README -------------------- ################################################################# ## ## ## GETPDB -- Update a Local PDB Database via Anonymous ftp ## ## ## ################################################################# GETPDB is a "simple" Unix shell script that updates your local copy of the PDB database to match the current copy on the Brookhaven PDB anonymous ftp server. It works by checking the size and timestamp of all current PDB files on the Brookhaven server (as stored at BNL in the file "all_entries/contents.lis") against the same info for your local copy of the database (as stored locally in "files.list"). Files are retrieved into or deleted from your local database to cause it to match the official Brookhaven version. A sample "files.list" index file from July 1995 is included in this directory. In addition, GETPDB strips from the distribution files any characters in columns 71-80 and any blanks spaces from column 70 back to the first nonblank space of each line. This results in a significant space savings, approaching 20% for some files. The original PDB files (pdbxxxx.ent) are deleted and the stripped files are retained in the local database (as xxxx.pdb). The GETPDB script is primarily intended for periodic updating of a local copy of the database. The local "files.list" will also be updated when GETPDB is run. The first time GETPDB is used at a site, it will try to download the entire database, and create a "files.list" index. This initial operation involves downloading thousands of files totaling well over 1 Gb of disk space, and should only be performed on evenings or weekends. Local or modified versions of PDB files can be added to your master PDB directory. As long as these files are not present in your copy of "files.list", they will remain untouched by GETPDB. The GETPDB script accepts a number of options on the command line when it is invoked. These options and other features are described in the comments found at the top and bottom of the script itself. The script has been tested under Digital UNIX 3.2 and SGI IRIX 5.3, but should run unmodified or with minor changes on other systems.