# GSoC with Flint – Finishing up

(Update: after some more work on matrix kernels I managed to improve upon what is given below, I don’t think it needs another post but see the long running thread on the flint-devel mailing list if you are interested the current performance of the code.)

First of all apologies for not keeping everyone up to date recently. I’ve just been cleaning up code and deciding what is in a good enough state to include in my final evaluation for GSoC so there wasn’t that much new of interest to report. This is all now done and my GitHub repository now contains only code that appears to be working well.

The main project goals of implementing quick methods to compute Hermite and Smith normal forms have been completed. Below is table and graph comparing the timings to those obtained with Sage (I spent about half a week being very sad about my work until I realised I needed to clear the cache to get the right timings for Sage!). The class of matrices used were random square with dimension and entry bits as in the table.

Flint timings (s):

4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|

10 | 0.0001498 | 0.000181 | 0.0003962 | 0.0003608 | 0.0011182 | 0.0011266 |

20 | 0.0007708 | 0.0010738 | 0.0014056 | 0.0023238 | 0.0040728 | 0.009321 |

30 | 0.0021142 | 0.003127 | 0.0042294 | 0.007065 | 0.015792 | 0.0427272 |

40 | 0.0050182 | 0.0066494 | 0.0101572 | 0.0169392 | 0.038086 | 0.0985452 |

50 | 0.0105184 | 0.0138692 | 0.0183774 | 0.0319924 | 0.0812636 | 0.2245758 |

60 | 0.0181876 | 0.0243814 | 0.0342512 | 0.0625658 | 0.1578844 | 0.4360994 |

70 | 0.028565 | 0.0373348 | 0.0546572 | 0.1110402 | 0.290543 | 0.8147328 |

80 | 0.0417594 | 0.0595228 | 0.0882594 | 0.1842448 | 0.4881932 | 1.3889464 |

90 | 0.0694218 | 0.08668 | 0.1405782 | 0.2854802 | 0.7817248 | 2.2501918 |

100 | 0.0880376 | 0.1192832 | 0.2052142 | 0.4448414 | 1.245277 | 3.5487596 |

Flint to Sage timing ratios (< 1 is best for us):

4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|

10 | 0.6965 | 1.0258 | 1.9950 | 1.5602 | 3.8941 | 2.8422 |

20 | 0.7261 | 0.8606 | 0.9396 | 1.1124 | 1.0772 | 0.9704 |

30 | 0.6531 | 0.7794 | 0.8381 | 0.8015 | 0.7669 | 0.7449 |

40 | 0.6492 | 0.7048 | 0.7380 | 0.5891 | 0.4896 | 0.4245 |

50 | 0.6595 | 0.6637 | 0.5511 | 0.3997 | 0.3666 | 0.3543 |

60 | 0.6354 | 0.6325 | 0.4950 | 0.3586 | 0.3128 | 0.2958 |

70 | 0.6016 | 0.5616 | 0.3836 | 0.3126 | 0.2764 | 0.2768 |

80 | 0.1957 | 0.2762 | 0.3801 | 0.6697 | 1.2174 | 1.6715 |

90 | 0.2509 | 0.3145 | 0.4730 | 0.8187 | 1.5402 | 2.1104 |

100 | 0.2527 | 0.3257 | 0.5340 | 0.9906 | 1.8450 | 2.5461 |

For simplicity’s sake I simply compared fmpz_mat_hnf_pernet_stein to Sage’s hermite_form, so it’s worth noting that Flint is faster still for small matrices if another method is used (which the fmpz_mat_hnf function should choose for you in practice). We can see here that although Flint does well for many matrices in this range it gets worse as the matrix and bit size gets larger, indeed this trend continues and my functions are far worse for very big matrices (Denis Kryzkov’s benchmark here gives a good indication of the scale of the problem). The run time of the asymptotically most efficient HNF method I have implemented (Pernet-Stein) depends heavily on the computation of nullspaces and so this is definitely an area that can be improved. Both approaches I’ve tried to speed up nullspace computation (multimodular and p-adic lifting) haven’t worked out being any better (asymptotically) than the pre-existing code based on the row echelon form. The remaining barrier here seems to be modular rref which I’ve looked at improving over the past week, this is certainly possible and I plan to carry on working on it (I have a complete but buggy implementation of a method described in Storjohann’s thesis and a working implementation of classical Gaussian elimination which is fast for small matrices at the moment). Finishing this will I hope bring the timings for Pernet-Stein down to something more like the ones in Sage. Sage uses IML for these computations, but even without using BLAS I think we could get a fair amount closer to IML’s performance for large matrices.

As for Smith forms, the randomised algorithm I worked on remains problematic, so I have left that out for now and stuck with the two slightly older approaches, a general one due to Kannan and Bachem and a modular one due to Iliopoulos for full rank square matrices. Below are some timings, again with comparisons to Sage. The class of matrices used is the same as that above, which means that it is highly likely that Iliopoulos’ would be appropriate (though it may not have been chosen by fmpz_mat_snf). Sage uses Pari for this, but it was easier for me to change my existing timing code than set up a direct comparison to Pari.

Flint timings (s):

4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|

10 | 0.0001008 | 0.0001776 | 0.0002186 | 0.0005208 | 0.0016076 | 0.0042602 |

20 | 0.001181 | 0.0017224 | 0.0025602 | 0.0050306 | 0.0131318 | 0.038855 |

30 | 0.0039768 | 0.006887 | 0.013429 | 0.031941 | 0.0904368 | 0.2860226 |

40 | 0.0138918 | 0.0201976 | 0.0408486 | 0.1258706 | 0.386331 | 1.1849852 |

50 | 0.0312358 | 0.0544678 | 0.1120322 | 0.370241 | 1.155253 | 3.6941254 |

60 | 0.061067 | 0.116257 | 0.2779696 | 0.8377208 | 2.5126946 | 7.9345656 |

Flint to Sage timing ratios (< 1 is best for us):

2 | 4 | 8 | 16 | 32 | 64 | 128 | |
---|---|---|---|---|---|---|---|

10 | 0.01630 | 0.02870 | 0.04939 | 0.05714 | 0.10546 | 0.22552 | 0.36044 |

20 | 0.05063 | 0.06777 | 0.08559 | 0.08868 | 0.10665 | 0.14573 | 0.19519 |

30 | 0.08086 | 0.07276 | 0.09437 | 0.10418 | 0.13330 | 0.17716 | 0.23537 |

40 | 0.08905 | 0.10716 | 0.09976 | 0.10617 | 0.16009 | 0.21715 | 0.27530 |

50 | 0.10550 | 0.11037 | 0.11270 | 0.11762 | 0.18405 | 0.24098 | 0.30266 |

60 | 0.12292 | 0.10537 | 0.11135 | 0.12858 | 0.17996 | 0.23113 | 0.29524 |

(I will update this table on my blog when more timings finish, I wanted to post this post and it is taking a while)

These timings look good for Flint but I’m not completely sure yet what the large scale behaviour is.

I still have a number of more experimental bits of code around that I will continue to work on getting into a more stable and usable state. Along with some other little bits that I never managed to get around to during the official GSoC period that I hope to get around to at some point.

Finally I want to say a huge thanks to everyone who commented on what I’ve been doing, and especially to Fredrik for his excellent advice over the course of the project. All the comments were very much appreciated.