to avoid iterative calculation of Henderson-Hasselbach equations in different pH you can use
. This should speed up the algorithm ~100 folds.
Old, but still useful information about implementation one can find
As mentioned in
"Theory" section there are many
pKa estimates based on different experiments.
On the other hand, one can try to obtain
pKa computationally. Here I present example how to
optimize
pKa in order to obtain more accurate isoelectric point predictions. For that protein dataset(s) with experimentally determined isoelectric points is needed. For proteins
there are at least two such: PIP-DB and SWISS-2DPAGE (for more details see
"Datasets").
Brute force attack:
Checking all possible combinations is not very tractable as even for 9 variables (charged amino acid
pKa)
in range of pH of 3 (±1.5 pH of average for given amino acid
pKa) with 0.01 precision gives 1.9683 × 10
22 possibilities. Far too many to compute.
Basinhopping optimization using truncated Newton algorithm:
This produces suboptimal results in more reasonable time with less than few dozens of iterations with
pKa optimized with high precision.
In the nutshell, the basinhopping algorithm is iterative search procedure with each cycle composed of the following features:
As an initial seed previously published
pKa values were used. To limit search space
truncated Newton algorithm
was used with 2 pH units bounds for
pKa variables
(e.g. if starting point for Cys
pKa was 8.5 the solution was allowed in the interval [6.5, 10.5]).
For more details how those algorithms works go
here.