Structure and dynamics of the ‘protein folding code’ infe(5)
发布时间:2021-06-08
发布时间:2021-06-08
图片特写:“争夺”图书馆 | 依法治校
22R.Wallace/BioSystems
103 (2011) 18–26
Fig.4.From Li et al.(2010),Fig.S10.Schematic energy landscape for prion strains and substrains.The energy landscape diagram suggests that substrains are distin-guishable collectives of prions that interconvert reproducibly and readily because they are separated by low activation energy barriers.The properties of a strain may vary depending on the environment in which it replicates,as the proportions of component substrains may change to favor that replicating most rapidly,indicated by the parison with Fig.1,and the subsequent argument,suggests an underlying topological structure for a‘prion reproduction code’.
is called PrP Sc,a spectrum ofsheet-rich conformers of the normal host protein PrP C,undergo Darwinian evolution in cell culture.In that work,prions show the evolutionary hallmarks:they are subject to mutation,as evidenced by heritable changes of their phenotypes, and to selective amplification,as found by the emergence of dis-tinct populations in different environments.Fig.4,from Li et al. (2010),shows a prion energy landscape similar to Fig.1.This sug-gests the possibility of characterizing the underlying topology of a ‘prion reproduction code’,in the sense of the sections above.
One might speculate that prions and prion diseases represent fossilized remains of Maury’s prebiotic amyloid world.
6.Topology and protein folding rate
Rate distortion arguments similar to those of Tlusty(2007a,b, 2010a),in conjunction with the topological approach,enable a direct analysis of protein folding rates,expanding on the treatment of Wallace(2010a).
Consider a generalized reaction pathway that,in a series of steps, takes an amino acid string S0at time0to afinal folded conforma-tion S f at time t in a long series of distinct,sequential,intermediate configurations S i.
Let N(n)be the number of possible paths having n steps that lead from S0to S f.
This is assumed to be a systematic process governed by a‘gram-mar’and‘syntax’driven by the underlying‘protein folding funnel’, so that it is possible to divide all possible paths x n={S0,S1,..., S n}into two sets,a small,high probability subset that conforms to the demands of the folding funnel topology,and a much larger ‘nonsense’subset having vanishingly small probability.
If N(n)is the number of high probability paths of length n,then the‘ergodic’limit
H=lim
n→∞
log
N(n)
n
,(3)
is assumed both to exist and be independent of the path x,a restate-ment of the Shannon–McMillan Theorem(Khinchin,1957).
That is,the folding of a particular protein,from its amino acid string to itsfinal form,is not a random event,but represents a highly –evolutionarily–structured‘statement’by an information source having source uncertainty H.Details of this argument can be found in Wallace(2010a).
An equivalence class algebra can be constructed by choosing different origin and end points S0,S f and defining equivalence of two states by the existence of a high probability meaningful path connecting them with the same origin and end.Disjoint parti-tion by equivalence class,analogous to orbit equivalence classes for dynamical systems,defines the vertices of a network of devel-opmental protein‘languages’,a network of metanetworks.Each vertex then represents a different equivalence class of develop-mental information sources.This is an abstract set of metanetwork ‘languages’.
This structure generates a groupoid,in the sense of Mathematical Appendix.States a j,a k in a set A are related by the groupoid morphism if and only if there exists a high probabil-ity grammatical path connecting them to the same base and end points,and tuning across the various possible ways in which that can happen–the different developmental languages–parameter-izes the set of equivalence relations and creates the(very large) groupoid.
There is an implicit hierarchy.First,there is structure within the system having the same base and end points.Second,there is a complicated groupoid defined by sets of dual information sources surrounding the variation of base and end points.
Consider the simple case,the set of dual information sources associated with afixed pair of beginning and end states.
High probability meaningful paths from S0to S f are structured by the uncertainty of the associated dual information source,and that,following standard arguments(e.g.,Wallace,2010a;Feynman, 2000),have a homological relation with free energy density.
Index possible information sources connecting base and end points by some set A=∪˛.The minimum channel capacity needed to produce average distortion less than D is,according to the Rate Distortion Theorem,as discussed in Wallace(2010a),the rate dis-tortion function R(D).We take the probability of an information source,Hˇ,associated with a particular folding geometry,as deter-mined by the standard expression
P[Hˇ]=
exp[−Hˇ/ R]
˛
exp[−H˛/ R]
,(4)
where the sum may be an abstract integral.Following Wallace (2010a,b),we have identified Tlusty’s rate distortion function as a kind of temperature equivalent affecting folding dynamics,in this information dynamics formulation for which information source uncertainty is seen as simply another form of free energy,adopting the perspective of Feynman(2000).
It is necessary that the sum/integral always converge.
There is,then,structure within a(cross-sectional)connected component in the base configuration space,determined by R.Some dual information sources will be‘richer’/smarter than others,but, conversely,must use more available channel capacity for their com-pletion.
This leads to a direct analysis of protein folding speeds,adapting the results of Wallace(2010a,b).
Dill et al.(2007)describe protein folding speeds as follows: