(Update: after some more work on matrix kernels I managed to improve upon what is given below, I don’t think it needs another post but see the long running thread on the flint-devel mailing list if you are interested the current performance of the code.)
First of all apologies for not keeping everyone up to date recently. I’ve just been cleaning up code and deciding what is in a good enough state to include in my final evaluation for GSoC so there wasn’t that much new of interest to report. This is all now done and my GitHub repository now contains only code that appears to be working well.
The main project goals of implementing quick methods to compute Hermite and Smith normal forms have been completed. Below is table and graph comparing the timings to those obtained with Sage (I spent about half a week being very sad about my work until I realised I needed to clear the cache to get the right timings for Sage!). The class of matrices used were random square with dimension and entry bits as in the table.
Flint timings (s):
4 | 8 | 16 | 32 | 64 | 128 | ||
---|---|---|---|---|---|---|---|
10 | 0.0001498 | 0.000181 | 0.0003962 | 0.0003608 | 0.0011182 | 0.0011266 | |
20 | 0.0007708 | 0.0010738 | 0.0014056 | 0.0023238 | 0.0040728 | 0.009321 | |
30 | 0.0021142 | 0.003127 | 0.0042294 | 0.007065 | 0.015792 | 0.0427272 | |
40 | 0.0050182 | 0.0066494 | 0.0101572 | 0.0169392 | 0.038086 | 0.0985452 | |
50 | 0.0105184 | 0.0138692 | 0.0183774 | 0.0319924 | 0.0812636 | 0.2245758 | |
60 | 0.0181876 | 0.0243814 | 0.0342512 | 0.0625658 | 0.1578844 | 0.4360994 | |
70 | 0.028565 | 0.0373348 | 0.0546572 | 0.1110402 | 0.290543 | 0.8147328 | |
80 | 0.0417594 | 0.0595228 | 0.0882594 | 0.1842448 | 0.4881932 | 1.3889464 | |
90 | 0.0694218 | 0.08668 | 0.1405782 | 0.2854802 | 0.7817248 | 2.2501918 | |
100 | 0.0880376 | 0.1192832 | 0.2052142 | 0.4448414 | 1.245277 | 3.5487596 |
4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|
10 | 0.6965 | 1.0258 | 1.9950 | 1.5602 | 3.8941 | 2.8422 |
20 | 0.7261 | 0.8606 | 0.9396 | 1.1124 | 1.0772 | 0.9704 |
30 | 0.6531 | 0.7794 | 0.8381 | 0.8015 | 0.7669 | 0.7449 |
40 | 0.6492 | 0.7048 | 0.7380 | 0.5891 | 0.4896 | 0.4245 |
50 | 0.6595 | 0.6637 | 0.5511 | 0.3997 | 0.3666 | 0.3543 |
60 | 0.6354 | 0.6325 | 0.4950 | 0.3586 | 0.3128 | 0.2958 |
70 | 0.6016 | 0.5616 | 0.3836 | 0.3126 | 0.2764 | 0.2768 |
80 | 0.1957 | 0.2762 | 0.3801 | 0.6697 | 1.2174 | 1.6715 |
90 | 0.2509 | 0.3145 | 0.4730 | 0.8187 | 1.5402 | 2.1104 |
100 | 0.2527 | 0.3257 | 0.5340 | 0.9906 | 1.8450 | 2.5461 |
For simplicity’s sake I simply compared fmpz_mat_hnf_pernet_stein to Sage’s hermite_form, so it’s worth noting that Flint is faster still for small matrices if another method is used (which the fmpz_mat_hnf function should choose for you in practice). We can see here that although Flint does well for many matrices in this range it gets worse as the matrix and bit size gets larger, indeed this trend continues and my functions are far worse for very big matrices (Denis Kryzkov’s benchmark here gives a good indication of the scale of the problem). The run time of the asymptotically most efficient HNF method I have implemented (Pernet-Stein) depends heavily on the computation of nullspaces and so this is definitely an area that can be improved. Both approaches I’ve tried to speed up nullspace computation (multimodular and p-adic lifting) haven’t worked out being any better (asymptotically) than the pre-existing code based on the row echelon form. The remaining barrier here seems to be modular rref which I’ve looked at improving over the past week, this is certainly possible and I plan to carry on working on it (I have a complete but buggy implementation of a method described in Storjohann’s thesis and a working implementation of classical Gaussian elimination which is fast for small matrices at the moment). Finishing this will I hope bring the timings for Pernet-Stein down to something more like the ones in Sage. Sage uses IML for these computations, but even without using BLAS I think we could get a fair amount closer to IML’s performance for large matrices.
As for Smith forms, the randomised algorithm I worked on remains problematic, so I have left that out for now and stuck with the two slightly older approaches, a general one due to Kannan and Bachem and a modular one due to Iliopoulos for full rank square matrices. Below are some timings, again with comparisons to Sage. The class of matrices used is the same as that above, which means that it is highly likely that Iliopoulos’ would be appropriate (though it may not have been chosen by fmpz_mat_snf). Sage uses Pari for this, but it was easier for me to change my existing timing code than set up a direct comparison to Pari.
Flint timings (s):
4 | 8 | 16 | 32 | 64 | 128 | 10 | 0.0001008 | 0.0001776 | 0.0002186 | 0.0005208 | 0.0016076 | 0.0042602 |
---|---|---|---|---|---|---|
20 | 0.001181 | 0.0017224 | 0.0025602 | 0.0050306 | 0.0131318 | 0.038855 |
30 | 0.0039768 | 0.006887 | 0.013429 | 0.031941 | 0.0904368 | 0.2860226 |
40 | 0.0138918 | 0.0201976 | 0.0408486 | 0.1258706 | 0.386331 | 1.1849852 |
50 | 0.0312358 | 0.0544678 | 0.1120322 | 0.370241 | 1.155253 | 3.6941254 |
60 | 0.061067 | 0.116257 | 0.2779696 | 0.8377208 | 2.5126946 | 7.9345656 |
Flint to Sage timing ratios (< 1 is best for us):
2 | 4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|---|
10 | 0.01630 | 0.02870 | 0.04939 | 0.05714 | 0.10546 | 0.22552 | 0.36044 |
20 | 0.05063 | 0.06777 | 0.08559 | 0.08868 | 0.10665 | 0.14573 | 0.19519 |
30 | 0.08086 | 0.07276 | 0.09437 | 0.10418 | 0.13330 | 0.17716 | 0.23537 |
40 | 0.08905 | 0.10716 | 0.09976 | 0.10617 | 0.16009 | 0.21715 | 0.27530 |
50 | 0.10550 | 0.11037 | 0.11270 | 0.11762 | 0.18405 | 0.24098 | 0.30266 |
60 | 0.12292 | 0.10537 | 0.11135 | 0.12858 | 0.17996 | 0.23113 | 0.29524 |
(I will update this table on my blog when more timings finish, I wanted to post this post and it is taking a while)
These timings look good for Flint but I’m not completely sure yet what the large scale behaviour is.
I still have a number of more experimental bits of code around that I will continue to work on getting into a more stable and usable state. Along with some other little bits that I never managed to get around to during the official GSoC period that I hope to get around to at some point.
Finally I want to say a huge thanks to everyone who commented on what I’ve been doing, and especially to Fredrik for his excellent advice over the course of the project. All the comments were very much appreciated.
This week I’ve worked on a variety of different things, unfortunately getting hung up on most of them!
First I found some bugs in my Saunders-Wan Smith normal form code, most of them ended up being fixable, however one still eludes me. It seems the bug is fairly rare (occurs for roughly 1 in 10000 test cases) and it is certainly related to the random nature of the algorithm but my current thinking is that the current behaviour should not be happening even when we get unlucky with the randomness. I’ve left this issue alone for now after spending a couple of days making no progress on it.
I then moved on to writing some general HNF/SNF methods that pick between available implementations based on matrix size, norm and anything else that I could work out as being relevant. While doing this I found that the Kannan-Bachem Hermite form method worked better than the modular method some of the time and so I decided to try and combine them into a modular method that works by taking minors into HNF rather than rows. I had some problems making this work correctly when the input was not of full rank and all the fixes I tried ended up having some corner case that made them fail. It just occurred to me however that when the modular method is used as part of the Pernet-Stein algorithm the matrix given is always square and so this was not perhaps a completely wasted effort.
I then moved back to working on Pernet-Stein, specifically I added capacity for non-full row rank matrices, and fully general matrices should be done very soon.
This week I’m going to be doing some profiling and comparisons between my implementations and existing ones and try and work out the reasons why certain results are as they are and how the HNF/SNF methods I now have can be made faster in the future. It should be helpful to have a note of the barriers to having faster HNF/SNF computation in Flint and how they could be overcome.
Of course I’ll also be tidying up and documenting several bits as I go along to fill in any gaps in functionality I have left along the way.
This week I worked more on the Smith normal form algorithm I talked about last week. I implemented Iliopoulos’ algorithm for computation of the Smith form modulo an arbitrary integer, this procedure is used in a couple of places as part of Saunders and Wan’s “engineered” algorithm. Firstly we use a prime power modulus to find the Smith normal form locally for small primes (i.e. those less than 100), the modular approach is also used for the rough part (concerning all the other primes) when the largest non-zero invariant factor is small compared to the size of the matrix. This algorithm is now working correctly, though the question of how to test it properly given its Monte Carlo nature is one that will require some thought. Currently whenever I have encountered a matrix for which the output of Saunders and Wan doesn’t match the deterministic algorithms’ outputs it has turned out to be a code bug rather than a side effect of the algorithm’s randomness. I suppose allowing for a reasonable number of failures beyond the expected number in test code would be one approach, but of course there will still be occasions when the number of failures exceeds this and allowing a large number of extra failures could allow real bugs to creep in.
For the next couple of days I’m going to work a little more on Smith forms, hopefully implementing Storjohann’s banded method for finding Smith forms of upper triangular matrices, this could be coupled with a HNF algorithm to give another (deterministic!) algorithm for the Smith form of a general matrix. I also need to clean up the Saunders and Wan code a bit as there are still a number of inefficiencies. I have not got a valence method included in this algorithm as this would require implementing a few other things (such as minimal polynomials), but the option is certainly there and it would easily slot in to the current code.
So this week I’ve been working on implementing the methods to find the Smith normal form of an integer matrix, rather than the Hermite normal form which I’ve worked on up until now. I’m mostly working off of the excellent paper of Saunders and Wan which describes an “engineered” algorithm to compute the Smith form which is highly practical. By engineered they mean that the algorithm switches between different methods based on practical observations regarding which method is likely to be the most efficient for different types of matrix, however the algorithm is structured in such a way that the asymptotic complexity remains as small as possible.
The algorithm is actually randomised of Monte Carlo type, meaning there is a controllable possibility of error present in the results. It becomes harder to be sure that the exponents of primes occurring in the Smith form are correct as the primes get smaller in the algorithm of Eberly, Giesbrecht and Villard (on which Saunders and Wan algorithm is based). So the computation is split into two parts, the smooth part concerning only primes less than a given limit and a rough part concerning all others. Currently I have an Eberly, Giesbrecht and Villard based routine working and finding rough parts fairly well and so I am currently working on a modular SNF algorithm to fill in the gap.
Saunders and Wan don’t stop with just this however, they also incorporate ideas described by Dumas, Saunders and Villard which use a so called valence based method which I would also like to add at some point. It seems this does require efficient computation of the minimal polynomial however, so I’m not sure how far I will get with this.
This week I’ve been working on improving the computation of the nullspace of an integer by implementing a p-adic lifting based method. The algorithm is also described in Stein’s book on computing with modular forms and is closely related to Dixon’s p-adic method for linear system solving. This is pretty much finished now (modulo some weird memory leaks and determining the best values for parameters such as p-adic precision and range of primes used) and does appear to be asymptotically faster than the multimodular algorithm I worked on before, though experiments suggest that currently the matrix needs to be fairly large before the p-adic algorithm becomes better.
I was originally thinking of implementing Pauderis and Storjohann’s algorithm if I had time this week, but have spent much of the past week looking at efficient nullspace computation (and being plagued a little by unexpected power cuts!) so haven’t got much further than a skeleton implementation. Maybe I can flesh this out at some point but for the next week I’m going to move on to algorithms for the Smith normal form, another important normal form for matrices over the integers, where both column and row operations are allowed.
Hopefully building on what I’ve done so far I should be able to get some fast Smith normal form algorithms implemented shortly, indeed one simple approach is to repeatedly transpose and compute Hermite normal forms until the resulting matrix is diagonal, this won’t necessarily be the Smith form but it can then be computed with relative ease. Something more sophisticated than this will undoubtedly be best, but many SNF algorithms involve computing the HNF at some point and working from there so the current HNF algorithms provide a good basis for those computations.
This week I’ve been working on improving the algorithm of Pernet-Stein as much as possible. After introducing as many of the ideas given in the original paper as I could I found the main bottleneck to actually be the computation of the nullspace of a certain submatrix of the matrix given, this is needed in order to efficiently solve a linear system which (likely) has a nasty final row. If we know a basis for a nullspace of the first n-1 rows of the system we can replace the final row with a random nice (having small entries) row and then find a solution to the original system by adding on a suitable multiple of the nullspace basis vector (the nullspace should be 1 dimensional for random input).
Flint uses the reduced row echelon form of a matrix to compute nullspaces (the nullspace of a matrix in this form can be easily read off and the transformation does not alter it) and so a natural place to improve nullspace computation is to improve the row echelon form computation. We can use a multimodular approach for this problem (this is described in Stein’s book on computing with modular forms) and I’ve achieved some nice speed-ups with this method in the past couple of days. For example the multimodular method is around 200x faster for 125×125 matrices with 32 bit entries. While this has made Hermite form computations a lot faster (nullspace computation is over 90% of the work for large matrices) I still want to try and see if this can be improved upon further, after all, in this case we don’t need the whole row echelon form just a vector in the kernel and the knowledge that the nullspace is 1-dimensional. So I plan to work on this further in the next couple of days, and depending on how I feel about this approach I will either spend the rest of this week making improvements to Pernet-Stein or possibly work on implementing the algorithm of Pauderis and Storjohann.
This week I’ve been working on implementing Pernet and Stein’s algorithm to compute the HNF. The basic idea of this method (which is in turn due to Micciancio and Warinschi) is to use the given matrix to construct another axillary matrix with small determinant (so the HNF of the new matrix can be computed without coefficient explosion) from which the original HNF can be reconstructed. The algorithm splits in to a few separate stages, the computation of two determinants, the modular HNF of the smaller matrix, the addition of a column to the HNF and then the addition of two rows. Each of these steps can be optimised quite a bit using some clever tricks, which I’m currently working on doing. The adding of rows and columns to the HNF now seems to work well, and exploiting the similarities of the submatrices whose determinants are needed to compute them both faster should be done tomorrow. At that point it should be interesting to take stock and see how well the Pernet-Stein approach is working compared to previous methods I’ve worked on. I’ll likely spend a couple more days after that improving this approach even more.
This is definitely the most complex algorithm I’ve worked on so far, but when completed it should really blow the others away for large enough random matrices.
Week 5 of my GSoC project is now over and this week I’ve been trying some variations of the older approaches to computing the Hermite normal form that I implemented in the first few weeks, I’ve had some success extending the Kannan-Bachem approach to non-square matrices and implementing other practical speed-ups. However my main project for this week, working on a variant of the modular approach for use when factors of the determinant are known, hasn’t worked out so well yet. I’ve been really struggling to make it work as intended and have decided to place this to the side for the moment and come back to it later as it is not vital to the rest of the project. The main use case would be if the determinant was larger than word sized but a factorisation into relatively prime factors all was known where each factor fits in a word, here there would probably be an advantage to using multiple matrices modulo these factors. If however one of the factors was larger than word size or indeed the determinant itself fits in a single word I can’t see how using such a factorisation would help much (if at all), there is also the problem of obtaining the factors in the first place, certainly some methods of computing the determinant (such as the method described by Abbot, Bronstein and Mulders) can give some factors of the determinant for free when computing it but this is certainly not guaranteed to give us useful information in light of the above. Nevertheless I would like to try and finish implementing this at some point, but don’t want to slow the whole project down just working on this.
Next week I’ll begin working on the algorithm given by Pernet and Stein which will take some time to do right and so I expect to be working on this in some way or another for at the whole week (at least!). Initially I’ll get a simplified version working (which will likely end up looking like the algorithm of Micciancio and Warinschi) and then I’ll start working on the improvements suggested in Pernet and Stein’s paper. Hopefully when this is done it will be extremely fast compared to the algorithms I have worked on so far, however it’s run time will depend on what I can squeeze out of the modular approach I worked on a couple of weeks ago.
This week I’ve been working more on some of the older approaches to computing the HNF. First I finished off the modulo determinant method which works well for square matrices of full rank. One of the issues with this method however is that the determinant needs to be calculated first and can be quite large, in which case there may be little gained by calculating modulo it.
I’ve also implemented the algorithm described by Kannan and Bachem which takes the principal minors to Hermite form one at a time, as opposed to other methods I’ve implemented so far which work column by column. This also provides significant improvement on the classical method, and indeed is performing better than the modular approach in many cases.
For the next week I’ll be trying to optimise the algorithms implemented so far as much as possible, especially the modular approach of Domich, Kannan and Trotter as this plays a key role in the algorithm of Pernet and Stein. One thing I would like to try is combining the two approaches I worked on this week and making a modulo determinant version of the Kannan-Bachem algorithm. There is also the possibility that factoring the determinant into some relatively prime parts and working modulo each of those, before recombining at the end with the Chinese remainder theorem, of course this involves factoring the determinant which could make the whole procedure less worthwhile. I also need to make the necessary modifications to ensure that all algorithms work in the non-square case as efficiently as possible.
This week I’ve been able to get started working on my GSoC project full time after finishing my university exams on Tuesday (hooray!).
I finished off what I was working on at the time of the last update, a more efficient algorithm that still computes the Hermite normal form in a straightforward way but takes less time by using the extended Euclidean algorithm to reduce rows rather than simply repeatedly reducing one row using another. The coefficients can still grow very fast in this algorithm which leaves it highly non-optimal in practice however small changes can be made to reduce this problem such as changing the order in which rows are reduced (this is described by Bradley here).
One way to mitigate the coefficient explosion problem is to perform the computations modulo some number and then use the result of this computation to find the actual HNF. If we use a multiple of the determinant of the HNF matrix we are trying to find (which we can in fact determine from the original matrix) the real HNF can be found fairly easily and this is what I have been working on for the past couple of days. This method of finding the HNF is due to Domich, Kannan and Trotter and the paper describing it can be found here. The function I have implemented seems to be working fine on the test cases it is given but I am getting some memory leaks that I’m really having trouble getting to the bottom of which is my main problem currently.
Once this is done I’ll start working on a multimodular algorithm which will involve computing the HNF modulo several primes and combining the resulting matrices via the Chinese remainder theorem.