X-Ray Crystallography, Biomolecular Structure Determination Methods

Encyclopedia of Spectroscopy and Spectrometry(2017)

引用 1|浏览8
暂无评分
摘要
The technique for determining three-dimensional (3 D) macromolecular structures at atomic resolution is called macromolecular crystallography. The method has evolved massively since the first structures of biomolecular molecules and macromolecules were determined in the 1940s and 1950s: landmark structures were vitamin B 12 , penicillin, myoglobin, and hemoglobin. It is now possible to determine very large structures such as viruses and ribosome particles. This has been in large part due to the development of better X-ray sources and detectors and the increasing power of computing, but would not have been possible without the great strides made in molecular biology such as the sequencing of whole genomes and the preparation of large amounts of macromolecular samples by recombinant techniques. The method is diffraction based, and only intensities can be measured, but not the associated phase information. A major part of the work is to recover phases by various indirect means. Once this has been achieved, a model can be built and refined against the data. Typically the accuracy of structures is between 0.1 and 0.5 Å and depends on the amount of data and the intrinsic thermal disorder of the structure. The various steps required to determine a 3 D structure are discussed in the following sections. Macromolecular Phase problem Synchroton radiation Three-dimensional structure X-ray diffraction Diffraction Theory and Structure Synthesis The techniques required for the determination of three-dimensional (3 D) macromolecular structures are similar to those used for small organic and inorganic molecules. In practice, the problems encountered are sufficiently different, so they require unique approaches for sample preparation through to structure refinement. The hardware and software needed for the work are specific to this field of research. Diffraction theory states that for the determination of structures at atomic resolution (interatomic spacing ∼1 Å), the use of radiation with wavelength of the order ∼1 Å is required. Imaging theory, such as that used in optical microscopy, would be appropriate if it was not for the fact that the refractive index of most materials, with respect to X-rays, is close to unity, making it impossible to manufacture lenses for imaging the sample. Measured data are therefore diffraction patterns, and the ideas most useful for data handling have more in common with spectroscopic methods. This said, diffraction arises from elastic rather than from inelastic scattering and is not an absorption phenomenon with the exception of anomalous dispersion, which uses absorption and dispersive effects close to an atomic absorption edge to obtain structural information. The crystalline nature of the sample results in discrete diffraction patterns or diffraction spots rather than a continuous diffraction. The positions of diffraction spots, in reciprocal space, are defined by integer numbers or Miller indices ( h,k,l ) and X-ray scattering by the measured intensity I ( h,k,l ). Intensities are proportional to the number of scattered X-ray photons and are related to the structure amplitude by I ( h,k,l )=|F( h,k,l )| 2 . The structure factor is the amplitude with associated phase such that F ( h,k,l ) =|F ( h,k,l )|e iϕ ( h,k,l ) . The electron density ρ ( x,y,z ) at a point ( x,y,z ) in the crystallographic unit cell of volume V can be calculated by Fourier transform, which relates the spectral diffraction components in scattering or reciprocal space to the electron density in real space. In the following discussions, indices will be abbreviated to the vector h and coordinates x,y,z to the vector r . [1] ρ ( r ¯ ) = 1 V ∑ F ( h ¯ ) e − 2 π i ( h ¯ . r ¯ ) The summation is over all Miller indices of the data set. It is sometimes useful to perform the reverse operation to obtain the structure amplitude and phases by inverse Fourier transform of the electron density. [2] F ( h ¯ ) = ∫ ρ ( r ¯ ) e 2 π i ( h ¯ . r ¯ ) d V The above integral is over a unit cell. Because F ( h ) contains both amplitude and phase, it contains equivalent information on the structure as the electron density map. Crystal Symmetry The unit cell is the smallest crystal volume that contains a collection of asymmetric units (molecules) that are related by crystallographic symmetry operations. The crystal is formed by a translational repeat of the unit cell in three dimensions. Crystal types fall into seven crystal systems with increasing complexity of internal symmetry. Understanding symmetry rules and correctly assigning them to the crystal being investigated is essential to structure determination. The simplest crystal system is triclinic, which allows any cell dimensions and angles for the unit cell and, at most, only an inversion center of symmetry. The highest symmetry crystal system is the cubic cell where all the cell dimensions are equal ( a=b=c ), and angles ( α = β = γ ) must be 90°. The internal symmetry elements can be combinations of rotational and screw axes, mirror planes and glide planes, whereas the Bravais lattice types can be P, I, or F. P is a primitive lattice (points at cell corners), I has an additional lattice point at the center of the cell, and F also has lattice point at the center of each of the six cell faces. For the monoclinic and orthorhombic space groups, there is also the lattice type C with an extra lattice point centered on one of the faces. For trigonal space groups that are characterized by a highest symmetry threefold rotation point group element, a rhombohedral lattice R as well as P is also possible. The internal symmetry is first defined by the point group formed from rotation axes and mirror planes. Translational symmetry elements, such as screw axes and glide planes, complete the description. There are 230 possible space groups, defined mathematically by a transformation of general coordinates by space group symmetry elements to generate all the copies within the unit cell. The task is to determine atomic coordinates for the crystal at hand. Macromolecules are chiral and only l -amino acid enantiomers are normally in proteins; hence, coordinate inversion and mirror/glide planes are not allowed as these operations transform l -amino acids into d- amino acids. For this reason, only 65 space groups are possible for macromolecular crystals ( Table 1 ). A full description of space groups and their symmetry elements can be found in the International Tables for Crystallography, Volume A . Space Group Symmetry Four symbols are used to describe space group symmetry. The Bravais lattice type P, C, I, R, or F, followed by three symbols that describe symmetry elements along each crystal axis. For biologic molecules, these will be rotation or screw axes symbols. For example P12 1 1 describes a primitive cell (P) with no symmetry element along the x and z axes and a twofold screw axis along the y -axis, and comprises a 180° rotation around the y -axis followed by a half translation along y . Such a symmetry transformation will move a coordinate x,y,z to a point –x, 1/2 +y,–z and define two copies of the asymmetric unit in the unit cell. The consequence of this is that eqns [1] and [2] must be expanded to take these symmetry elements into account, and this has a profound effect on the observed symmetry of the diffraction patterns and whether or not diffraction spots are allowed for any specific set of indices h . Assigning the Space Group The first step in assigning the space group is to determine the crystal system, which is done by finding the cell dimensions a , b , c , α , β , and γ, which predict the diffraction pattern. This will establish whether any equalities exist, such as a = b ≠ c , α=β=γ= 90° (tetragonal). Data are measured in a trial space group that provides a measured set of indices with associated intensities. For a full assignment of space group, it is also necessary to determine the point group symmetry of the diffraction pattern and that the intensities of symmetry-related reflections should agree. A rule of thumb is that for a correct assignment the agreement between intensities should be ∼10%. The Bravais lattice type and the presence of screw axes can be established through systematically absent reflections that are missing because of translational symmetry elements. The process of space group assignment has been made easier by software tools (Pointless, Xtriange, and Xprep) that analyze statistically the presence of point group symmetry and systematic absences. However, for enantiomorphic space groups such as 4 1 and 4 3 , the systematic absences are ambiguous and are consistent with either assignment. Inspection of electron density maps, calculated in the alternative space groups, can reveal the correct space group by looking for expected chemical structures such as α -helices of the correct hand and l -amino acids. Sample Preparation and Crystallogenesis The preparation of crystals requires highly purified protein samples usually made by recombinant methods. Genomic sequencing projects provide a vast amount of DNA sequence information encoding protein sequences. The coding DNA can be extracted from genomic DNA and inserted into a plasmid, which is reproduced by a bacterium such as E. coli and protein expression chemically initiated in the bacterium. Purification has been made easier by engineering terminal affinity tags such as 6xHis, which preferentially bind to an Ni chromatography column. Growing crystals is still a challenge especially in the case of membrane proteins. The process is a matter of trial and error, but in essence, concentrated protein (∼10 mgml −1 ) is mixed with a chemical precipitant and equilibrated until a supersaturated state is approached, at which point crystals may form. Factors such as pH, additive chemicals, and types of precipitant can result in a vast search space to explore. The process has been made easier, in recent years, by the availability of commercial screens made in 96-well format and crystallization and visualization robots that eliminate some of the human error and require less protein for the initial trials. Drop sizes as small as 100 nl can be achieved, which allows several 96-well trays to be prepared from as little as 100 μl of protein. Data Collection The elements of data collection are an X-ray source, a detector, a method for controlling the crystal sample, and software for processing the data. The technologies have changed greatly over the years so that methods for collecting and processing data are now rapid and, in some cases, have been automated for structural genomic projects that require high-throughput approaches for handling ever-increasing numbers of crystal samples. X-ray Sources The first X-ray sources were evacuated sealed tubes in which electrons emitting from a 40 kV tungsten filament were focused onto a metal target. The absorbing target material re-emits energy as characteristic X-ray emission lines lying on top of bremsstrahlung radiation. Although convenient, the power dissipation of these devices and hence their X-ray output are low. Modern X-ray generators use rotating anode technology to dissipate heat energy more efficiently to achieve higher X-ray output. Improvements have also been achieved by increasing electron focusing coupled with confocal optics that can collect and focus the X-ray beam into an intense micro-focus beam. For even more intense beams that have better optical properties, synchrotron storage rings are needed. These X-ray sources utilize charged particles, accelerated to relativistic speeds, as the source of spontaneous X-radiation. Increases in intensity arise from relativistic effects that cause emitted radiation to be observed within a narrow cone of radiation. The opening half-angle 1/ γ of the cone is given by the inverse of the relativistic Lorentz factor γ . To achieve less than milliradian opening angles requires high-energy storage rings operating at several GeV. Typically, synchrotron radiation source energies, used by crystallographers, are in the range 2.0–8.0 GeV, peaking at wavelengths in the range 4.0–0.5 Å. Additional increases in intensity can be attained with insertion (magnet) devices placed in the straight sections of the storage ring. Multipole wigglers multiply the radiation sources by the number of magnetic poles of the wiggler, whereas undulators employ interference effects, by smaller magnetic deviations, producing output X-radiation over a narrow spectral range at interference harmonics. Both multipole wigglers and undulators are sufficiently broad in their bandwidths for anomalous dispersion measurements (see MAD phasing); however, undulators have reduced emission angles and provide higher beam brilliances several orders of magnitude larger than a bending magnet. A multipole wiggler is of advantage if the entire spectrum from 0.5 to 2.0 Å is required for Laue data measurements in fast time-resolved diffraction experiments. At its simplest, the layout at a synchrotron beamline consists of optics to condition the beam, a motorized rotation axis to change the orientation of the sample, and a detector to record images ( Figure 1 ). Detectors The usual method for collecting macromolecular single crystal data is in rotation camera geometry. The crystal rotation axis is at right angles to the X-ray beam and data are collected on an area detector normal to the X-ray beam. Macromolecular diffraction patterns are dense (∼3000 spots per image), and so area detectors are the most efficient method for collecting data. Electronic area detectors are available, which are made from doped phosphor. For example, the MAR345 image plate has a long-lived luminescent phosphor of pixel size 100–150 μm, an image diameter of 345 mm, and an image scanning time of 80 s. The RAXIS IV image plate effective readout time is faster because of a dual-plate system. Faster detectors, with readout time as low as 1 s, are now available in tiled CCD detectors coated with a phosphor emitting visible photons collected and transported through tapered fiber optics to a cooled CCD chip. The tiled CCD detectors are made in arrays as large as 4×4 (325 mm). Although CCD detectors are fast, they have problems arising from distortions in the tiled chips and fiber optics geometry that must be corrected for. Novel detectors that have fewer distortion problems and are faster are the flatbed MAResearch detector coated with an Se layer that directly converts X-ray photons into charges that are collected by a TFT pixel electrode. Another is the PILATUS pixel detector (424×435 mm, pixel size 139 μm) with a readout time of a few milliseconds. The readout speed permits a different data collection philosophy of continuous data collection and full 3 D profile fitting of diffraction spots. Cryo-Freezing Crystals are mounted so that they may be held in the X-ray beam and rotated. A modern approach is to scoop the crystal up in a tiny loop, made of nylon or plastic attached to a solid rod, which is then flash-frozen in liquid nitrogen. During data collection, crystals are bathed in a continuous flow of nitrogen gas normally at the temperature 100 K. Cryo-freezing reduces radiation damage from the X-rays, as well as reducing data noise due to atomic thermal motion. Untreated crystals often crack if flash-frozen; therefore, they are generally presoaked in a cryoprotectant solution before freezing. Soaking crystals is necessary for making heavy atom derivatives in solving the phase problem or for inserting molecular substrates in the case of an enzyme. All these may require different cryoprotectants and can be time consuming to find. Data Processing Software Diffraction data are collected as a large number of contiguous rotation images (ΔΦ=0.1–2.0°). Data reduction consists of three main steps. The first (autoindexing) is to determine the Bravais lattice and unit cell constants from the recorded diffraction pattern, allowing Miller indices to be associated with each diffraction spot. In the second stage (integration), the intensities of the reflections are measured by predicting the locations of the reflections and determining an average profile, and then scaling this to the observed spots. Finally, all measurements are scaled and merged, which will typically include corrections for: • variation in the beam intensity/illuminated volume • crystal decay • absorption of the diffracted photons in the sample and air • geometric factors such as Lorentz and beam polarization factors There is a wide range of data reduction packages available to the crystallographer. The most common are HKL (Otwinowski & Minor), Mosflm (Leslie) and Scala (Evans), XDS/XSCALE (Kabsch), and d*TREK (Pflugrath). In this article, the use of Mosflm/Scala and XDS/XSCALE is covered, which are well suited to wide and fine phi-sliced data, respectively, and are free to the academic user. Mosflm and Scala The autoindexing and integration program Mosflm is best accessed through the GUI iMosflm. This allows a sweep of diffraction images to be loaded and will guide the user through the necessary steps ( Figure 2 ). The first task of autoindexing can be performed automatically using images selected by the GUI and will, in most cases, give an adequate result. The most likely solution, that is, the highest symmetry solution with a good match between observed and predicted reflections, will be selected automatically. The cell constants and diffraction parameters should be refined before integration, after which the final processing may take place. Once the data are integrated, the rest of the data reduction may be performed through CCP4i. The ‘scale and merge intensities’ task will allow data from multiple runs of Mosflm to be combined, which is necessary for multipass and multi-wavelength data sets. XDS/XSCALE Unlike Mosflm and Scala, XDS and XSCALE do not provide graphical user interfaces. They are instead run via plain text input files that describe the crystallographic problem and the experimental geometry. To use these programs, more understanding of data processing is required. The XDS program ( http://www.mpimf-heidelberg.mpg.de/∼kabsch/xds/ ) may be considered as a sequence of tools, with autoindexing performed by determining the detector corrections and background (XYCORR and INIT) and then finding the spots and indexing (COLSPOT and IDXREF). Unlike other data reduction packages, XDS will not recommend the best solution, instead it will perform all processing in the lowest symmetry space group P1. After autoindexing, the integration (DEFPIX and INTEGRATE) and postrefinement (CORRECT) are fairly straightforward. As the processing is performed in P1, it will be necessary to impose the correct point group and lattice constraints at the final step (CORRECT). Once the data from all sweeps are integrated, they may be scaled and merged with XSCALE. Phase Problem A problem arises in reconstructing the electron density map of the molecule using eqn [1] : the absence of experimental phases. This is the so-called phase problem. Techniques exist that estimate phase distributions by indirect means that exploit special relationships between structure amplitudes, by utilizing a known homologous structure, or by determining heavy atom substructure as a source of starting phase information. These methods constitute the most important group of macromolecular structure determination procedures. The software to do this, although similar to other diffraction techniques, have many elements that are specific to macromolecular structure determination and are in general different in their theoretical approaches as well as the instrumentation and software for achieving the tasks. The Patterson Methods Most methods for estimating phases use a special kind of Fourier transform called the Patterson function P ( q ), which uses only the coefficients | F ( h )| 2 . The virtue of this is that phases are not required and only the measured quantities I ( h ) are used. The function is an autocorrelation of the electron density distribution ( ρ ). The vector space q =( u,v,w ) is related to real space coordinates r by the convolution of the electron density such that: [3] P ( q ¯ ) = ∫ ρ ( r ¯ ) ρ ( q + r ¯ ) d V [4] = ∑ | F ( h ¯ ) | 2 e − 2 π i ( h ¯ . q ¯ ) A real space distribution of N atoms will create N ( N – 1) nonorigin peaks in vector space q with the coordinates q = r 1 – r 2 and peak heights ρ ( r 1 ) ρ ( r 2 ). The vector q connects atoms, and for N atoms there are N 2 ways to do this. For a macromolecule that contains several thousand atoms, a vector space map corresponds to several million peaks, most of which are overlapping and therefore not interpretable. However, as a description of the macromolecule, it is rich in information and can be used to establish the orientation and translation of a molecule in the unit cell. For substructures of relatively small numbers of heavy atoms, their coordinates can be obtained by deconvoluting the vectors into real space. MIR Phasing The original method of experimental phasing, named multiple isomorphous replacement (MIR), relies on the substitution of a few heavy metal sites, such as Hg, Au, and Pt, into the crystal unit cell to make a heavy atom derivative of the macromolecular structure. Differences in structure factor amplitudes, between native (unmodified) and derivative data sets, provide information on the heavy atom positions that can be located using Patterson or direct methods. The details for this will be discussed later in text. Once the positions have been determined, the amplitude and phases, corresponding to the heavy atom sites, can be calculated and used with the observed amplitudes of the native structure ( F p) and derivative ( F ph) to estimate phases of the entire structure. For only one derivative, there are two solutions to the equations, resulting in a phase ambiguity. This ambiguity can be resolved by adding more derivative data sets, and it is common to need as many as four derivatives to achieve this. Another way to resolve the phase ambiguity is to include anomalous dispersion data for each derivative (MIRAS), thus reducing the total number of required derivatives. In favorable circumstances, a single derivative with associated anomalous difference suffices (SIRAS). Although the overall isomorphous difference can be large (30%), the difference often arises from other effects such as distortions in the macromolecular structure and changes in the unit cell parameters. These sources of amplitude differences cannot be adequately modeled, resulting in large phase errors. If differences in amplitudes originate from only the heavy atoms then they are said to be isomorphous and can produce good estimates of phase. A difficulty of the method is that it requires trial-and-error testing to find compounds that bind tightly to the protein, while also achieving isomorphous diffraction. Experimental methods that rely entirely on anomalous dispersion effects are isomorphous and, despite providing a weaker signal, are intrinsically more accurate. These have proven to be very popular in recent times with the advent of synchrotron radiation sources and will be discussed in the next section. MAD Phasing Experimental phasing with multi-wavelength anomalous dispersion makes use of both anomalous differences (differences between Freidel pairs of reflections Δ ano =| F ( h )|–| F ( – h )|) and dispersive differences (differences between F values for a given reflection recorded at different wavelengths). The differences can occur only during inelastic scattering close to an atomic absorption edge. Anomalous dispersion effects can be modulated by varying the X-ray wavelength around the absorption edge. At the X-ray wavelengths normally used in diffraction experiments, most ‘light’ atoms (H, C, N, O, and S) scatter elastically. For most ‘heavy’ atoms, differences can be significant and are often sufficient to solve the phase problem. Selenium is particularly useful as proteins can be made in the presence of selenomethione, which has the sulfur atom replaced with a selenium. The usual frequency of methionine is of the order of 1%, and a protein molecule may typically contain 2–20 methionine residues depending on its size. For MAD phasing, it is essential to perform local scaling to ensure accurate differences between data measure at different wavelengths to calculate an unbiased estimate of the contribution from the heavy atom, F A ( h ). Direct and Patterson methods use these differences to determine the positions of heavy atom sites, which can then be refined and used for phase calculation. The phase calculation is performed in conjunction with the refinement of the heavy atom substructure parameters (atom location, occupancy, and temperature factor) to give the most likely model and phase set. The phases may then be improved by means of density modification and extended to higher resolution beyond the initial heavy atom phasing. A data collection technique may be employed to optimize absorption anomalous differences by measuring data in small wedges, interleaved with measurements taken at a 180° rotation offset to reduce the effects of absorption and crystal decay. A similar approach may be taken for optimizing dispersive differences, by measuring small wedges of data at different wavelengths. These methods are intended to minimize differences due to time-dependent fluctuations such as beam intensity fluctuations and radiation damage. Although there are many packages available to assist in MAD phasing, of particular note are the SHELX tools (SHELXC/D/E), which give good results and are particularly quick. The PHENIX autosolve wizard provides more automated tools and includes local scaling, substructure determination, phasing, and phase improvement and requires little information other than the experimental data and a brief description of the experiment. Finally, autoSHARP is the tool of choice in difficult circumstances when data and model errors creep in and require a statistically unbiased approach to phase refinement. As with the PHENIX wizard, all steps are covered in this case by incorporating SHELX tools. An example of successful MAD phasing using Se anomalous dispersion phasing is shown in Figure 3 . SAD Phasing Although the underlying principles of SAD phasing are similar to those of MAD, only one set of anomalous differences is available, resulting in phase ambiguity: for each reflection, two phases are possible as the sign of the phase is unknown. Because of this, phase improvement plays an essential role in breaking this ambiguity. Either a high solvent fraction or a high-resolution data are necessary for successful phasing. There are, however, many benefits to SAD phasing. First, it may be possible to determine the structure from a single data set, thus avoiding scaling problems. Second, data collection and data reduction are more straightforward. It is remarkable to observe that in favorable circumstances, it is possible to accurately phase a data set from anomalous differences that can be as small as 5% of the mean amplitude. A diagrammatic scheme of SAD phasing is shown in Figure 4 . Of particular interest is the use of naturally occurring sulfur atoms for the phasing, from cysteine and methionine residues. At typical synchrotron wavelengths (∼1 Å), the anomalous difference is very small, requiring exceptionally high-quality data. At longer wavelengths (∼2Å), the signal becomes more appreciable, making S-SAD phasing a reality, but at the cost of increased absorption and air scatter. As with any SAD phasing problem, a high solvent fraction, high-resolution data, and NCS averaging of a number of molecules are essential. Molecular Replacement The principle behind molecular replacement is to use a known structure, which is believed to be similar to the unknown macromolecule, to calculate an initial set of phases. To do this, one or more copies must be correctly rotated and translated, based on a comparison of the calculated and observed structure factor amplitudes. It is important to note that accurately measured low-resolution data (<4 Å) is critical to the success of molecular replacement searches. The first step in the molecular replacement process is the choice of a model structure. In some circumstances this will be trivial, for example, in ligand-binding studies or when investigating point mutations. In most cases, however, it will be necessary to perform a search of a suitable 3 D model in the Protein Data Bank (PDB) based on the putative sequence. The results of this search may be further modified, for example, removal of nonconserved residues with the program ‘chainsaw.’ Although the search is strictly 6 D (three rotations and three translations for each molecule), it is usually divided into separate rotation and translation searches, leading to a massive reduction in the search space. The result of this search is a model that may be either used for calculation of an initial set of phases to be improved with density modification or taken as a starting point for refinement. Although there are a number of molecular replacement programs available, Phaser uses the most sophisticated scoring method that combines a high degree of automation with maximum likelihood-based scoring of potential solutions, and is particularly well suited to multiprotein complexes. For straightforward cases, programs such as Molrep may give a perfectly good result much more quickly. The whole process can be performed automatically with MR Bump, which incorporates a number of MR programs. This will search the PDB database based on the results of several sequence searches and generates a list of possible search models. Molecular replacement searches are then run with each model and the results scored by the refinement of the solution. In challenging cases, the brute force nature of this search process may be beneficial. Direct Methods The use of direct methods for ab initio determination of macromolecular structures has been limited to molecules with ∼2000 nonhydrogen atoms and data at high resolution, <1.2 Å. However, these methods have been useful for augmenting other phasing methods that will be explored in this section. The foundations of direct methods are the linear phase combinations of phases called structure invariants, of which the most useful are the triplet invariants that relate three amplitudes and their phases. [5] Φ HK = φ H + φ K + φ − H − K The triplet phase obeys the conditional probability distribution: [6] P ( Φ HK ) = [ 2 π l o ( A HK ) ] − 1 exp ( A HK cos Φ HK ) [7] A HK = ( 2 / N 1 / 2 ) | E H E K E H + K | where E is a normalized structure factor and H and K are vector indices that satisfy the three reflection relationships. I 0 is the zeroth order-modified Bessel function and N is the number of identical atoms in the structure. These estimates are reliable when the structure factor amplitudes are large. The tangent formula is used when a sufficiently large number of phase pairs are known. [8] tan ( φ H ) = − ∑ K | E K E − H − K | sin ( φ K + φ − H − K ) ∑ K | E K E − H − K | cos ( φ K + φ − H − K ) The method, as implemented in SHELXD, is of most use in determining substructures of heavy atoms in MAD or MIR data by using normalized isomorphous or anomalous differences as the structure amplitudes. An extension of the substructure approach is implemented in ACORN, which locates the positions of fragments or domains by molecular replacement methods that are then used to calculate a starting phase set that can be expanded and improved by direct method techniques. The robustness of the method depends on a combined powerful figure of a merit scoring system that discriminates between correct and incorrect phase sets. Electron Density Improvement Nearly always, experimentally determined phases are improved by density modification methods that use various assumptions about the electron density map that place limits on the observed starting phases. A map is modified and new phases generated by inverse Fourier transformation (eqn [2] ). The process is iterated until some convergence is achieved. The method can be used to estimate phases and amplitudes for unmeasured reflection and to extend phases to high resolution where only amplitudes exist. There are a number of algorithms for improving electron density that can be used separately or in combination. The most common are solvent flattening, histogram matching, and molecular averaging, although the last-mentioned requires more than one copy of a molecule in the asymmetric unit. Less common is the use of direct method phasing in the form of Sayre’s equation. Macromolecular crystal structures typically contain solvent in the region of 30–70%, and solvent flattening exploits the observation that electron density, in these regions, is flat due to high thermal motion and disorder of solvent molecules. Peaks in this region are assumed to originate from noise and are eliminated by fixing solvent regions to a mean constant value. The frequency of electron density distribution is a characteristic of the atomic distances within a macromolecule, and histogram matching can be understood as a quantitative description of a map that has the appearance of protein electron density. The electron density can therefore be modified to follow the expected distribution. In many cases, macromolecules crystallize as a number of molecules in the asymmetric unit. This is called noncrystallographic symmetry (NCS) and obeys local symmetry in addition to the rules of crystal symmetry. An example can be an oligomeric complex with the NCS axis at some random orientation. The procedure is to average identical copies of electron density and in so doing increasing the signal-to-noise ratio of the electron density. Sayre’s equation is very powerful for phase refinement at very high resolution but less useful at medium resolution, but it can still be effective, providing the shape function θ ( h ) can be modified to model the overlap of atoms at nonatomic resolution. [9] F ( H ) = [ θ ( H ) / V ] ∑ k F ( K ) F ( H − K ) The estimated phases can be used to augment experimental phases with σ A weighting and combined with other density-modified procedures. Structure Refinement Refinement of a macromolecular model relies on the agreement between the calculated and observed diffraction pattern amplitudes. The process is complicated by a model containing thousands of atoms, which is refined against a data set comprising ∼100 000 measurements. The refinement process is intimately connected to the model building during which new features are identified and added into the refinement steps. At lower resolution, when the number of refined parameters may outnumber the measurements, it is essential to use chemical restraints to augment measurements, or alternatively chemical information can be used as a constraint to reduce the number of parameters. Only at atomic resolutions (<1.2 Å) is it feasible to perform unrestrained refinement, although in practice the weight between the contributions is modified in favor of amplitudes. A more robust method of refinement, and one that is mostly used, is to maximize the probability that a model is correct (maximum likelihood) in the form of the sum of logarithms rather than least-squares refinement. The simplest parameterization is to refine coordinates and a thermal parameter for each atom. If the resolution is sufficiently high, it is possible to refine six anisotropic thermal parameters. However, macromolecular atomic motions can be highly correlated, and an alternative approach is group anisotropic TLS refinement, where a 20-term tensor models translational, rotational, and screw motions and has proven useful for large domain motion refinement. If atomic coordinates are significantly away from their refinement, minimum convergence of refinement can be improved by employing simulated annealing, which increases the number of sampling points with molecular dynamics algorithms that perturb the structure. With sufficiently high-resolution data, model building is done automatically coupled to refinement cycles by programs like APR/wARP. If this is not possible, a model-building program like O or Coot must be used for manual building followed by rounds of structure refinement. Ultimately a validation of the structure is required through acceptable R-factors (15–25%), between observed and calculated structure factor amplitudes, while ensuring that the free R-factor, calculated with a small proportion of reflections not included in refinement, continues to fall. At the same time, the geometry of the structure must fall within accepted bounds with respect to the Ramachandran torsional angles, bond distances, and angles. The most commonly used refinement programs are Refmac5, CNS, and Buster-TNT. All use maximum likelihood refinement methods. See also Fibres and Films Studied Using X-Ray Diffraction , Small Molecule Crystallography , X-Ray Crystallography of Macromolecules, Theory and Methods . Further Reading Drenth, 1999 J. Drenth Principles of Protein X-ray Crystallography 1999 Springer-Verlag Heidelberg ISBN:0 387 98587 5. Giacovazzo et al., 2002 C. Giacovazzo H.L. Monaco G. Artioli Fundamentals of Crystallography 2nd edn 2002 Oxford University Press Oxford ISBN13: 9780198509585. McCoy and McDonald, 2003 A. McCoy N. McDonald Special issue: experimental phasing: proceedings of the CCP4 study weekend Acta Crystallographica. Section D: Biological Crystallography 59 11 2003 Murshudov et al., 2008 G. Murshudov F. von Delft C. Ballard Special issue: molecular replacement: proceedings of the CCP4 study weekend Acta Crystallographica. Section D: Biological Crystallography 64 1 2008 Read, 1999 Read RJ (1999) Macromolecular Crystallography Course Presented in the 1999–2000 Academic Year to Staff and Students of CIMR . University of Cambridge and the MRC-LMB. http://www.structmed. cimr.cam.ac.uk/course.html (accessed June 2009). Read and Sussman, 2007 R.J. Read J.L. Sussman Evolving Methods for Macromolecular Crystallographers Erice, Italy, 19–28 May 2005. NATO Serier II: Mathematics, Physics and Chemistry vol. 245 2007 Springer Amsterdam ISBN: 978-1-4020-6314-5. Rossmann and Arnold, 2006 M.G. Rossmann E. Arnold International Tables for Crystallography Volume F: Crystallography of Biological Macromolecules 2006 Springer Amsterdam Turkenburg and Brady, 1999 J.P. Turkenburg L. Brady Special issue: data collection and processing: proceedings of the CCP4 study weekend Acta Crystallographica. Section D: Biological Crystallography 55 10 1999
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要