Hingefind -

a novel algorithm to investigate domain motions in proteins.

Version 6-22-95

This homepage contains the documentation of the algorithm.

To access the Hingefind ftp site, click here.
To access Willy's homepage, click here.
To learn more about domain motions, click here.
To access the Protein Motions Database at Stanford, click here (contains a documentation of all known instances of domain movements of structures at X-PLOR 3.1 script language (Axel T. Brunger, 1992). The output psf and pdb files can be visualized with standard graphics packages, e.g. our graphics program vmd or with Quanta (MSI). It compares two known structures (e.g. two different crystal structures of a protein or the results of molecular dynamics simulations) and partitions the protein with a prespecified resolution in preserved subdomains. It then determines effective hingeaxes which characterize the domain movements with respect to the reference domain. Both parts of the algorithm can be used alone, i.e. one can assign domains manually and let the algorithm determine the effective hingeregions between the domains, or one can use the algorithm to partition a protein into preserved subdomains. The method does not require any previous knowledge about functionally relevant domains or hinge motions, however a critical analysis of the results is recommended. The output files provide information about the accuracy of the found hinge-rotation. The variety of options and resolutions allow to find an optimal partitioning. The user can assess the validity of the proposed movement and change the script if necessary.

Warning: In some cases the rotational fit may be inaccurate or there may be no uniform domain motions.

Example: Differences in domain orientation between G and F-actin:

Click here to get a 87 kByte image of actin domain movements characterized by Hingefind:

Backbone trace of Lorenz F-actin structure (colored) compared to Kabsch crystal structure (black line). The nucleotide and divalent cation of the comparison structure are rendered as grey van-der-Waals spheres. Five segments have been determined by "slow" mode partitioning at 3.76 A resolution (90 % of initial rms-deviation). The color of the tubes codes for the partitioned segments found: reference-segment 1 (blue), segment 2 (green), segment 3 (orange), segment 4 (yellow), segment 5 (purple), no segment assigned (grey). The two structures are superimposed by a least-squares fit of segment 1. For segment 2 and segment 3, the rotation axis and hingepoint-COM connecting lines of movements relative to segment 1 are shown as red tubes. The arrow indicates a right-handed rotation which transforms the COM of the respective segment of F-actin on the COM of the segment in G-actin. The rotation angle of segment 2 is 11.5 degrees and the relative error 7.7 %. The rotation angle of segment 3 is 12.9 degrees with error 4.1 %. The domain movements yield a closure of the nucleotide binding cleft in the Lorenz structure. Segments 4 and 5 comprise only few residues. More on actin research.

2. Files and shellscripts

There should be several files and scripts to set up the algorithm:

hingefind

A unix shell script that runs the X-PLOR job and writes three X-PLOR stream files which contain commands from which X-PLOR can compute filenames and the resolution of the algorithm.

partition.str

A stream file which contains necessary X-PLOR commands to set up the structure. It may contain a pointer to a psf file. It is recommended to use segid "AP0" for the protein, otherwise hingefind.inp has to be modified. Note that the coordinates in the two compared pdb files must be both compatible with the structure. The pdb files may contain additional atoms which do not have to be specified in partition.str if not used in the partitioning.

dum.top

A X-PLOR topology file with the residues of dummy molecules used in the algorithm for visualization of hingepoints and axes.

prexplor.dim

The X-PLOR file which contains array sizes for compilation (35,000 atom version). It will probably be necessary to compile X-PLOR with the larger BUFMAX parameter for the loops. This executable is named "xl" in the hingefind script.

hingefind.inp

The X-PLOR script with the algorithm. There are a variety of variables and paths the user has to specify in the head of the file :
  • $ndomains:The number of domains to be found. Recommended: 2 - 5, depending on the resolution.
  • $maxccounter:The number of maximum cycles of the "converge" loop. In case the algorithm does not converge within the specified number of cycles (this was very rarely observed to occur in the "fas" partitioning mode at extreme resolutions), a warning message is writen in the log file. Recommended: 10 - 20.
  • $assign: This variable determines the mode of the partitioning part of the script: "man" specifies manual assignment of domains, no partitioning. Up to 9 domains can be assigned below and $ndomains must be smaller than 10. "fas" codes for the fast version of the automatic partitioning algorithm, in which the connectivity of the residues in the found domains is NOT maintained. "slo" specifies the slow partitioning algorithm with maintained connectivity of the domains.
  • $nndist: The variable determines at which max distance two residues are considered next neighbors in the "slo" mode partitioning.
  • store1...9: The selection attributes which allow the assignment of up to 9 domains by hand in the "man" mode.
  • $case1COO: String that specifies input file for the coordinates in pdb format or pointer to pdb file. The path has to be specified. X-PLOR can compute the filename from the variable $case1 defined in the streamfile casefile1 written by the shellscript. Coordinates written to main corrdinate set.
  • $case2COO: String for 2nd pdb file (comparison coordinate set Analogous to $case1COO.
  • $oname: Output pdb file with assigned domains, hingepoints, axes. The filename can be computed using the $fname variable which contains the resolution as defined by the shellscript. The path has to be specified.
  • $uname: Output psf file, analogous to $oname.
  • $dname: Output log file with information about the proposed hinge rotations, residues, accuracies. The filename can be computed using the $fname variable. The path has to be specified.

It is recommended the user tests the script with the domains of interest assigned manually beforehand, then tries automatic "fas" partitioning with the resolution in the shellscript set between 50 and 100 (%). Finally the partitioning should be repeated in "slo" mode for selected cases and resolutions.

3. A brief description of the algorithm

The method will be published in the near future, please inquire about a preprint or reference at the e-mail above. The algorithm is separated in two parts: the "partitioning" and the "rotational fit" section. The "partitioning" part determines domains with preserved structure in the two compared coordinate sets, depending on a prespecified resolution. The method uses the least-squares fit method (W. Kabsch, 1976) as implemented in X-PLOR. A domain is found in a iterative procedure, in which poor matching residues are excluded from the domain and good matches are included. In the "slo" mode only the heaviest connected set is considered, maintaining the connectivity of the changing domain.

The "rotational fit" method attempts to locate a hingepoint and a rotation axis which characterize the transformation of the domain between main and comparison coordinate set as a hingerotation. The hingepoint could be anywhere on the axis, but is determined here, by construction, as closest point to the center of mass of the domain . The rotation about an axis without translation, in general, will not yield the closest fit of the Kabsch least-squares method, so the problem is to find the least-squares solution with the constraint that transformations are not allowed, only rotations about an unknown hingepoint. It turns out that this constrained problem is not easy to solve and the exact solution may be too expensive to compute, so an approximation is used in the algorithm. The accuracy of the approximation can be assessed by comparing the least-squares fit with the proposed rotational fit. Note that the (rmsd) error of the fit may be due to the error of the approximation OR to the constraint of not allowing translations.

The construction works as follows: A Kabsch least-squares fit yields an translation vector v of the COM and a rotation axis r with angle alpha. The rotation axis is then projected on the bisecting plane of v which yields a new rotation axis r' and a "projection angle" beta, defined by r and r', from which one can compute the new rotation angle alpha' = alpha * cos(beta). Using this projected rotation, one can construct a hingepoint on the bisecting plane. A rotation with r' and angle alpha' about the hingepoint then transforms the COM of the main set on the COM of the comp set. So the projection maintains (relative to the least-squares method) the removal of the COM difference between the sets, but approximates the rotation. The idea is that in hingebending motions there should be a relatively large COM separation |v|, and the rotation r should be almost parallel to the bisecting plane of v. Thus, in addition to the rmsd error of the fit, the validity of the approximation can be assessed by checking the angle beta, which should be small. One finds that the method works best for larger domains comprising several secondary structure elements.

4. Output files

There are three output files specified by the variables $oname $uname and $dname: pdb and psf files of the labeled structure, and the log file of the run. The pdb and psf files can be used to visualize the results of the algorithm: The data is labeled by segid's:

The dummy molecules show an arrow along the rotation axis with it's orientation representing the right-handed rotation about the axis. The hingepoint in the middle of the arrow is connected to the COM of the main and comparison coordinate set of the domain to illustrate the rotation angle. The rotation angle and other useful information about the run, the domains, and the accuracy of the rotational fitting can be found in the self-explanatory log file.

NOTE: The X-PLOR logfiles would contain several MBytes of data for each run, so the standard output is piped to /dev/null. The standard output should only be used for debugging of modified or augmented scripts.

5. Accuracy check and other useful info

The log file contains information about the accuracy of the fitting as outlined above in 3. The relative error (in percent) is computed as

[ RMSD (proj) / RMSD (least sq.) ] - 1.

It is recommended to run the cases within a range of resolutions between 50 and 100%. Running a particular system with a range of resolutions in "fas" mode, it was found that there exist one or more windows of optimum resolution where the relative errors were very small. Therefore it is recommended to try a range of resolutions first with the "fas" mode, find the window(s) of small error and then calculate selected resolutions in the window(s) in "slo" mode with a higher number of domains. The error of the domain fitting was found to decrease 5 times with "slo" partitioning due to the connectivity of the domains.

Recommended reading about classification of domain movements: Gerstein et al., Biochemistry 33 (1994), 6739-6749.

How to find hingeregions: For shear-type motions (see Gerstein et al.) the found effective rotation axis will intersect the boundary between the domains in most cases and the effective hinge region can be found at the intersection. In hinge-type motions, the axis will be parallel to the interface. To find 'real' hingeresidues, it is useful to investigate the proteinbackbone at the segment interface. A hingeresidue should be close to the proposed rotation axis. For this type of movement, the proposed hingepoint may be useful to find the hinge, but it should be clear that a hinge"point" can be anywhere on the axis.

6. Correspondence

Updates and changes may of the method be neccessary once in a while. To stay informed about changes send e-mail to the author. The known users will also receive a preprint of the upcoming paper. The author would appreciate 'bug reports' and any comments regarding the usefulness of the algorithm and strategies of usage. Please send your correspondence to wriggers@ks.uiuc.edu (NeXT-mail OK).