First of all apologies for not keeping everyone up to date recently. I’ve just been cleaning up code and deciding what is in a good enough state to include in my final evaluation for GSoC so there wasn’t that much new of interest to report. This is all now done and my GitHub repository now contains only code that appears to be working well.

The main project goals of implementing quick methods to compute Hermite and Smith normal forms have been completed. Below is table and graph comparing the timings to those obtained with Sage (I spent about half a week being very sad about my work until I realised I needed to clear the cache to get the right timings for Sage!). The class of matrices used were random square with dimension and entry bits as in the table.

Flint timings (s):

4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|

10 | 0.0001498 | 0.000181 | 0.0003962 | 0.0003608 | 0.0011182 | 0.0011266 |

20 | 0.0007708 | 0.0010738 | 0.0014056 | 0.0023238 | 0.0040728 | 0.009321 |

30 | 0.0021142 | 0.003127 | 0.0042294 | 0.007065 | 0.015792 | 0.0427272 |

40 | 0.0050182 | 0.0066494 | 0.0101572 | 0.0169392 | 0.038086 | 0.0985452 |

50 | 0.0105184 | 0.0138692 | 0.0183774 | 0.0319924 | 0.0812636 | 0.2245758 |

60 | 0.0181876 | 0.0243814 | 0.0342512 | 0.0625658 | 0.1578844 | 0.4360994 |

70 | 0.028565 | 0.0373348 | 0.0546572 | 0.1110402 | 0.290543 | 0.8147328 |

80 | 0.0417594 | 0.0595228 | 0.0882594 | 0.1842448 | 0.4881932 | 1.3889464 |

90 | 0.0694218 | 0.08668 | 0.1405782 | 0.2854802 | 0.7817248 | 2.2501918 |

100 | 0.0880376 | 0.1192832 | 0.2052142 | 0.4448414 | 1.245277 | 3.5487596 |

Flint to Sage timing ratios (< 1 is best for us):

4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|

10 | 0.6965 | 1.0258 | 1.9950 | 1.5602 | 3.8941 | 2.8422 |

20 | 0.7261 | 0.8606 | 0.9396 | 1.1124 | 1.0772 | 0.9704 |

30 | 0.6531 | 0.7794 | 0.8381 | 0.8015 | 0.7669 | 0.7449 |

40 | 0.6492 | 0.7048 | 0.7380 | 0.5891 | 0.4896 | 0.4245 |

50 | 0.6595 | 0.6637 | 0.5511 | 0.3997 | 0.3666 | 0.3543 |

60 | 0.6354 | 0.6325 | 0.4950 | 0.3586 | 0.3128 | 0.2958 |

70 | 0.6016 | 0.5616 | 0.3836 | 0.3126 | 0.2764 | 0.2768 |

80 | 0.1957 | 0.2762 | 0.3801 | 0.6697 | 1.2174 | 1.6715 |

90 | 0.2509 | 0.3145 | 0.4730 | 0.8187 | 1.5402 | 2.1104 |

100 | 0.2527 | 0.3257 | 0.5340 | 0.9906 | 1.8450 | 2.5461 |

For simplicity’s sake I simply compared fmpz_mat_hnf_pernet_stein to Sage’s hermite_form, so it’s worth noting that Flint is faster still for small matrices if another method is used (which the fmpz_mat_hnf function should choose for you in practice). We can see here that although Flint does well for many matrices in this range it gets worse as the matrix and bit size gets larger, indeed this trend continues and my functions are far worse for very big matrices (Denis Kryzkov’s benchmark here gives a good indication of the scale of the problem). The run time of the asymptotically most efficient HNF method I have implemented (Pernet-Stein) depends heavily on the computation of nullspaces and so this is definitely an area that can be improved. Both approaches I’ve tried to speed up nullspace computation (multimodular and p-adic lifting) haven’t worked out being any better (asymptotically) than the pre-existing code based on the row echelon form. The remaining barrier here seems to be modular rref which I’ve looked at improving over the past week, this is certainly possible and I plan to carry on working on it (I have a complete but buggy implementation of a method described in Storjohann’s thesis and a working implementation of classical Gaussian elimination which is fast for small matrices at the moment). Finishing this will I hope bring the timings for Pernet-Stein down to something more like the ones in Sage. Sage uses IML for these computations, but even without using BLAS I think we could get a fair amount closer to IML’s performance for large matrices.

As for Smith forms, the randomised algorithm I worked on remains problematic, so I have left that out for now and stuck with the two slightly older approaches, a general one due to Kannan and Bachem and a modular one due to Iliopoulos for full rank square matrices. Below are some timings, again with comparisons to Sage. The class of matrices used is the same as that above, which means that it is highly likely that Iliopoulos’ would be appropriate (though it may not have been chosen by fmpz_mat_snf). Sage uses Pari for this, but it was easier for me to change my existing timing code than set up a direct comparison to Pari.

Flint timings (s):

4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|

10 | 0.0001008 | 0.0001776 | 0.0002186 | 0.0005208 | 0.0016076 | 0.0042602 |

20 | 0.001181 | 0.0017224 | 0.0025602 | 0.0050306 | 0.0131318 | 0.038855 |

30 | 0.0039768 | 0.006887 | 0.013429 | 0.031941 | 0.0904368 | 0.2860226 |

40 | 0.0138918 | 0.0201976 | 0.0408486 | 0.1258706 | 0.386331 | 1.1849852 |

50 | 0.0312358 | 0.0544678 | 0.1120322 | 0.370241 | 1.155253 | 3.6941254 |

60 | 0.061067 | 0.116257 | 0.2779696 | 0.8377208 | 2.5126946 | 7.9345656 |

Flint to Sage timing ratios (< 1 is best for us):

2 | 4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|---|

10 | 0.01630 | 0.02870 | 0.04939 | 0.05714 | 0.10546 | 0.22552 | 0.36044 |

20 | 0.05063 | 0.06777 | 0.08559 | 0.08868 | 0.10665 | 0.14573 | 0.19519 |

30 | 0.08086 | 0.07276 | 0.09437 | 0.10418 | 0.13330 | 0.17716 | 0.23537 |

40 | 0.08905 | 0.10716 | 0.09976 | 0.10617 | 0.16009 | 0.21715 | 0.27530 |

50 | 0.10550 | 0.11037 | 0.11270 | 0.11762 | 0.18405 | 0.24098 | 0.30266 |

60 | 0.12292 | 0.10537 | 0.11135 | 0.12858 | 0.17996 | 0.23113 | 0.29524 |

(I will update this table on my blog when more timings finish, I wanted to post this post and it is taking a while)

These timings look good for Flint but I’m not completely sure yet what the large scale behaviour is.

I still have a number of more experimental bits of code around that I will continue to work on getting into a more stable and usable state. Along with some other little bits that I never managed to get around to during the official GSoC period that I hope to get around to at some point.

Finally I want to say a huge thanks to everyone who commented on what I’ve been doing, and especially to Fredrik for his excellent advice over the course of the project. All the comments were very much appreciated.

]]>First I found some bugs in my Saunders-Wan Smith normal form code, most of them ended up being fixable, however one still eludes me. It seems the bug is fairly rare (occurs for roughly 1 in 10000 test cases) and it is certainly related to the random nature of the algorithm but my current thinking is that the current behaviour should not be happening even when we get unlucky with the randomness. I’ve left this issue alone for now after spending a couple of days making no progress on it.

I then moved on to writing some general HNF/SNF methods that pick between available implementations based on matrix size, norm and anything else that I could work out as being relevant. While doing this I found that the Kannan-Bachem Hermite form method worked better than the modular method some of the time and so I decided to try and combine them into a modular method that works by taking minors into HNF rather than rows. I had some problems making this work correctly when the input was not of full rank and all the fixes I tried ended up having some corner case that made them fail. It just occurred to me however that when the modular method is used as part of the Pernet-Stein algorithm the matrix given is always square and so this was not perhaps a completely wasted effort.

I then moved back to working on Pernet-Stein, specifically I added capacity for non-full row rank matrices, and fully general matrices should be done very soon.

This week I’m going to be doing some profiling and comparisons between my implementations and existing ones and try and work out the reasons why certain results are as they are and how the HNF/SNF methods I now have can be made faster in the future. It should be helpful to have a note of the barriers to having faster HNF/SNF computation in Flint and how they could be overcome.

Of course I’ll also be tidying up and documenting several bits as I go along to fill in any gaps in functionality I have left along the way.

For the next couple of days I’m going to work a little more on Smith forms, hopefully implementing Storjohann’s banded method for finding Smith forms of upper triangular matrices, this could be coupled with a HNF algorithm to give another (deterministic!) algorithm for the Smith form of a general matrix. I also need to clean up the Saunders and Wan code a bit as there are still a number of inefficiencies. I have not got a valence method included in this algorithm as this would require implementing a few other things (such as minimal polynomials), but the option is certainly there and it would easily slot in to the current code.

]]>The algorithm is actually randomised of Monte Carlo type, meaning there is a controllable possibility of error present in the results. It becomes harder to be sure that the exponents of primes occurring in the Smith form are correct as the primes get smaller in the algorithm of Eberly, Giesbrecht and Villard (on which Saunders and Wan algorithm is based). So the computation is split into two parts, the smooth part concerning only primes less than a given limit and a rough part concerning all others. Currently I have an Eberly, Giesbrecht and Villard based routine working and finding rough parts fairly well and so I am currently working on a modular SNF algorithm to fill in the gap.

Saunders and Wan don’t stop with just this however, they also incorporate ideas described by Dumas, Saunders and Villard which use a so called valence based method which I would also like to add at some point. It seems this does require efficient computation of the minimal polynomial however, so I’m not sure how far I will get with this.

]]>I was originally thinking of implementing Pauderis and Storjohann’s algorithm if I had time this week, but have spent much of the past week looking at efficient nullspace computation (and being plagued a little by unexpected power cuts!) so haven’t got much further than a skeleton implementation. Maybe I can flesh this out at some point but for the next week I’m going to move on to algorithms for the Smith normal form, another important normal form for matrices over the integers, where both column and row operations are allowed.

Hopefully building on what I’ve done so far I should be able to get some fast Smith normal form algorithms implemented shortly, indeed one simple approach is to repeatedly transpose and compute Hermite normal forms until the resulting matrix is diagonal, this won’t necessarily be the Smith form but it can then be computed with relative ease. Something more sophisticated than this will undoubtedly be best, but many SNF algorithms involve computing the HNF at some point and working from there so the current HNF algorithms provide a good basis for those computations.

]]>Flint uses the reduced row echelon form of a matrix to compute nullspaces (the nullspace of a matrix in this form can be easily read off and the transformation does not alter it) and so a natural place to improve nullspace computation is to improve the row echelon form computation. We can use a multimodular approach for this problem (this is described in Stein’s book on computing with modular forms) and I’ve achieved some nice speed-ups with this method in the past couple of days. For example the multimodular method is around 200x faster for 125×125 matrices with 32 bit entries. While this has made Hermite form computations a lot faster (nullspace computation is over 90% of the work for large matrices) I still want to try and see if this can be improved upon further, after all, in this case we don’t need the whole row echelon form just a vector in the kernel and the knowledge that the nullspace is 1-dimensional. So I plan to work on this further in the next couple of days, and depending on how I feel about this approach I will either spend the rest of this week making improvements to Pernet-Stein or possibly work on implementing the algorithm of Pauderis and Storjohann. ]]>

This is definitely the most complex algorithm I’ve worked on so far, but when completed it should really blow the others away for large enough random matrices.

]]>Next week I’ll begin working on the algorithm given by Pernet and Stein which will take some time to do right and so I expect to be working on this in some way or another for at the whole week (at least!). Initially I’ll get a simplified version working (which will likely end up looking like the algorithm of Micciancio and Warinschi) and then I’ll start working on the improvements suggested in Pernet and Stein’s paper. Hopefully when this is done it will be extremely fast compared to the algorithms I have worked on so far, however it’s run time will depend on what I can squeeze out of the modular approach I worked on a couple of weeks ago.

]]>I’ve also implemented the algorithm described by Kannan and Bachem which takes the principal minors to Hermite form one at a time, as opposed to other methods I’ve implemented so far which work column by column. This also provides significant improvement on the classical method, and indeed is performing better than the modular approach in many cases.

For the next week I’ll be trying to optimise the algorithms implemented so far as much as possible, especially the modular approach of Domich, Kannan and Trotter as this plays a key role in the algorithm of Pernet and Stein. One thing I would like to try is combining the two approaches I worked on this week and making a modulo determinant version of the Kannan-Bachem algorithm. There is also the possibility that factoring the determinant into some relatively prime parts and working modulo each of those, before recombining at the end with the Chinese remainder theorem, of course this involves factoring the determinant which could make the whole procedure less worthwhile. I also need to make the necessary modifications to ensure that all algorithms work in the non-square case as efficiently as possible.

]]>I finished off what I was working on at the time of the last update, a more efficient algorithm that still computes the Hermite normal form in a straightforward way but takes less time by using the extended Euclidean algorithm to reduce rows rather than simply repeatedly reducing one row using another. The coefficients can still grow very fast in this algorithm which leaves it highly non-optimal in practice however small changes can be made to reduce this problem such as changing the order in which rows are reduced (this is described by Bradley here).

One way to mitigate the coefficient explosion problem is to perform the computations modulo some number and then use the result of this computation to find the actual HNF. If we use a multiple of the determinant of the HNF matrix we are trying to find (which we can in fact determine from the original matrix) the real HNF can be found fairly easily and this is what I have been working on for the past couple of days. This method of finding the HNF is due to Domich, Kannan and Trotter and the paper describing it can be found here. The function I have implemented seems to be working fine on the test cases it is given but I am getting some memory leaks that I’m really having trouble getting to the bottom of which is my main problem currently.

Once this is done I’ll start working on a multimodular algorithm which will involve computing the HNF modulo several primes and combining the resulting matrices via the Chinese remainder theorem.

]]>