Hingefind - a novel algorithm to investigate domain motions in proteins. Version 6-22-95 This software is copyrighted, (c) 1995, by Willy Wriggers under the terms of the legal statement in the distribution. Available by anonymous ftp to lisboa.ks.uiuc.edu in the directory pub/wriggers/hingefind. --- Documentation: 1. General remarks 2. Files and shellscripts 3. A brief description of the algorithm 4. Output files 5. Accuracy check and other useful info 6. Correspondence --- 1. General remarks 'Hingefind' is an algorithm for the identification of domain movements and their characterization and visualization by hinge points and rotation axes. The method is implemented in X-PLOR 3.1 script language (Axel T. Brunger, 1992). The output psf and pdb files can be visualized with standard graphics packages, e.g. the graphics program VMD of the Theoretical Biophysics Group, Beckman Institute, UIUC, (available from ftp.ks.uiuc.edu in pub/mdscope/vmd, or http://www.ks.uiuc.edu:1250/Research/vmd) or with Quanta (MSI). It compares two known structures (e.g. two different crystal structures of a protein or the results of molecular dynamics simulations) and partitions the protein with a prespecified resolution in preserved subdomains. It then determines effective hingeaxes which characterize the domain movements with respect to the reference domain. Both parts of the algorithm can be used alone, i.e. one can assign domains manually and let the algorithm determine the effective hingeregions between the domains, or one can use the algorithm to partition a protein into preserved subdomains. The method does not require any previous knowledge about functionally relevant domains or hinge motions, however a critical analysis of the results is recommended. The output files provide information about the accuracy of the found hinge-rotation. The variety of options and resolutions allow to find an optimal partitioning. The user can assess the validity of the proposed movement and change the script if necessary. Warning: In some cases the rotational fit may be inaccurate or there may be no uniform domain motions. 2. Files and shellscripts There should be several files and scripts to set up the algorithm: hingefind A unix shell script that runs the X-PLOR job and writes three X-PLOR stream files which contain commands from which X-PLOR can compute filenames and the resolution of the algorithm. partition.str A stream file which contains necessary X-PLOR commands to set up the structure. It may contain a pointer to a psf file. It is recommended to use segid "AP0" for the protein, otherwise hingefind.inp has to be modified. Note that the coordinates in the two compared pdb files must be both compatible with the structure. The pdb files may contain additional atoms which do not have to be specified in partition.str if not used in the partitioning. dum.top A X-PLOR topology file with the residues of dummy molecules used in the algorithm for visualization of hingepoints and axes. prexplor.dim The X-PLOR file which contains array sizes for compilation (35,000 atom version). It will probably be necessary to compile X-PLOR with the larger BUFMAX parameter for the loops. This executable is named "xl" in the hingefind script. hingefind.inp The X-PLOR script with the algorithm. There are a variety of variables and paths the user has to specify in the head of the file : $ndomains:The number of domains to be found. Recommended: 2 - 5, depending on the resolution. $maxccounter:The number of maximum cycles of the "converge" loop. In case the algorithm does not converge within the specified number of cycles (this was very rarely observed to occur in the "fas" partitioning mode at extreme resolutions), a warning message is writen in the log file. Recommended: 10 - 20. $assign: This variable determines the mode of the partitioning part of the script: "man" specifies manual assignment of domains, no partitioning. Up to 9 domains can be assigned below and $ndomains must be smaller than 10. "fas" codes for the fast version of the automatic partitioning algorithm, in which the connectivity of the residues in the found domains is NOT maintained. "slo" specifies the slow partitioning algorithm with maintained connectivity of the domains. $nndist: The variable determines at which max distance two residues are considered next neighbors in the "slo" mode partitioning. store1...9: The selection attributes which allow the assignment of up to 9 domains by hand in the "man" mode. $case1COO: String that specifies input file for the coordinates in pdb format or pointer to pdb file. The path has to be specified. X-PLOR can compute the filename from the variable $case1 defined in the streamfile casefile1 written by the shellscript. Coordinates written to main corrdinate set. $case2COO: String for 2nd pdb file (comparison coordinate set Analogous to $case1COO. $oname: Output pdb file with assigned domains, hingepoints, axes. The filename can be computed using the $fname variable which contains the resolution as defined by the shellscript. The path has to be specified. $uname: Output psf file, analogous to $oname. $dname: Output log file with information about the proposed hinge rotations, residues, accuracies. The filename can be computed using the $fname variable. The path has to be specified. It is recommended the user tests the script with the domains of interest assigned manually beforehand, then tries automatic "fas" partitioning with the resolution in the shellscript set between 50 and 100 (%). Finally the partitioning should be repeated in "slo" mode for selected cases and resolutions. 3. A brief description of the algorithm The method will be published in the near future, please inquire about a preprint or reference at the e-mail above. The algorithm is separated in two parts: the "partitioning" and the "rotational fit" section. The "partitioning" part determines domains with preserved structure in the two compared coordinate sets, depending on a prespecified resolution. The method uses the least-squares fit method (W. Kabsch, 1976) as implemented in X-PLOR. A domain is found in a iterative procedure, in which poor matching residues are excluded from the domain and good matches are included. In the "slo" mode only the heaviest connected set is considered, maintaining the connectivity of the changing domain. The "rotational fit" method attempts to locate a hingepoint and a rotation axis which characterize the transformation of the domain between main and comparison coordinate set as a hingerotation. The hingepoint could be anywhere on the axis, but is determined here, by construction, as closest point to the center of mass of the domain . The rotation about an axis without translation, in general, will not yield the closest fit of the Kabsch least-squares method, so the problem is to find the least-squares solution with the constraint that transformations are not allowed, only rotations about an unknown hingepoint. It turns out that this constrained problem is not easy to solve and the exact solution may be too expensive to compute, so an approximation is used in the algorithm. The accuracy of the approximation can be assessed by comparing the least-squares fit with the proposed rotational fit. Note that the (rmsd) error of the fit may be due to the error of the approximation OR to the constraint of not allowing translations. The construction works as follows: A Kabsch least-squares fit yields an translation vector v of the COM and a rotation axis r with angle alpha. The rotation axis is then projected on the bisecting plane of v which yields a new rotation axis r' and a "projection angle" beta, defined by r and r', from which one can compute the new rotation angle alpha' = alpha * cos(beta). Using this projected rotation, one can construct a hingepoint on the bisecting plane. A rotation with r' and angle alpha' about the hingepoint then transforms the COM of the main set on the COM of the comp set. So the projection maintains (relative to the least-squares method) the removal of the COM difference between the sets, but approximates the rotation. The idea is that in hingebending motions there should be a relatively large COM separation |v|, and the rotation r should be almost parallel to the bisecting plane of v. Thus, in addition to the rmsd error of the fit, the validity of the approximation can be assessed by checking the angle beta, which should be small. One finds that the method works best for larger domains comprising several secondary structure elements. 4. Output files There are three output files specified by the variables $oname $uname and $dname: pdb and psf files of the labeled structure, and the log file of the run. The pdb and psf files can be used to visualize the results of the algorithm: The data is labeled by segid's: "AP0" is the unconverged rest of the protein, "AP1" is the reference domain of the protein, "AP2", "AP3", etc, are additional domains, "DUM2", "DUM3", etc, are the dummy molecules which visualize the hinge-rotation of the domains. The dummy molecules show an arrow along the rotation axis with it's orientation representing the right-handed rotation about the axis. The hingepoint in the middle of the arrow is connected to the COM of the main and comparison coordinate set of the domain to illustrate the rotation angle. The rotation angle and other useful information about the run, the domains, and the accuracy of the rotational fitting can be found in the self-explanatory log file. NOTE: The X-PLOR logfiles would contain several MBytes of data for each run, so the standard output is piped to /dev/null. The standard output should only be used for debugging of modified or augmented scripts. 5. Accuracy check and other useful info The log file contains information about the accuracy of the fitting as outlined above in 3. The relative error (in percent) is computed as [ RMSD (proj) / RMSD (least sq.) ] - 1. It is recommended to run the cases within a range of resolutions between 50 and 100%. Running a particular system with a range of resolutions in "fas" mode, it was found that there exist one or more windows of optimum resolution where the relative errors were very small. Therefore it is recommended to try a range of resolutions first with the "fas" mode, find the window(s) of small error and then calculate selected resolutions in the window(s) in "slo" mode with a higher number of domains. The error of the domain fitting was found to decrease 5 times with "slo" partitioning due to the connectivity of the domains. Recommended reading about classification of domain movements: Gerstein et al., Biochemistry 33 (1994), 6739-6749. How to find hingeregions: For shear-type motions (see Gerstein et al.) the found effective rotation axis will intersect the boundary between the domains in most cases and the effective hinge region can be found at the intersection. In hinge-type motions, the axis will be parallel to the interface. To find 'real' hingeresidues, it is useful to investigate the proteinbackbone at the segment interface. A hingeresidue should be close to the proposed rotation axis. For this type of movement, the proposed hingepoint may be useful to find the hinge, but it should be clear that a hinge"point" can be anywhere on the axis. 6. Correspondence Updates and changes may of the method be neccessary once in a while. To stay informed about changes send e-mail to the author. The known users will also receive a preprint of the upcoming paper. The author would appreciate 'bug reports' and any comments regarding the usefulness of the algorithm and strategies of usage. Please send your correspondence to wriggers@uiuc.edu (NeXT-mail OK). _______________________________________________ Willy R. Wriggers Theoretical Biophysics Group Beckman Institute University of Illinois at Urbana-Champaign 405 North Mathews Avenue Urbana, IL 61801, USA _______________________________________________