Your algorithm works, essentially, by mapping each document to a point
in a high-dimensional space, and looking for nearby points.
There is an extensive literature on clustering algorithms, and on
data structures for speeding them up. I'm not current enough to
recommend any recent survey paper, but you could easily make the thing
asymptotically O(n log n) by building tree structures in document
space, and searching tree nodes near a document's own for near
neighbors...
rst