pretty geeky indeed
Let me turn the notch a bit to uber-geeky though..
jfulcer... who said posts IDs are sequential?
What if the current PostID is 20594802 but the next one is actually 20594902? should the crawler try to read 100 non-existing posts?
Also, what about boards that use non-numerical post IDs? or ones like the flickr forums that have a combination of IDs from different ranges, where some new posts have IDs of 1003232 and some new ones have IDs like 700000323412? Certainly the system can not read everything from 1003232 to 700000323412 (when we know for a fact that almost nothing exists in the middle.
Furthermore, do you read a post at a time? Or rather full threads and full pages? Why should the crawler do differently? Instead of reading one post at a time, it can do 25 at a time.
The crawler indeed needs to "hammer" the board as little as possible. Searching a board that doesn't respond to its members just because the crawler is inefficient, now thats a very bad crawler. So the crawler not only needs to be a secondary priority (which naturally is one of the reasons it takes a while to get to all the data on Disboards) but also not try to needlessly "read" things that either doesn't exist or not urgent. BoardTracker for example doesn't read a thread again and again in hope that there will be new data in it. It "knows" when there are new posts in a thread
Part of the reason you don't see all the posts in BoardTracker (the newer version) yet is technical issues n our side. It takes time to prepare all the data we have for searching. But the main reason is DisBoards related.. we can't get to all the data as fast as we would have wanted. If we did that, DisBoards servers would have been brought to its knees and the board would slow to a crawl. So the koalas at BoardTracker have to be patient... and so should we
We are working hard to get the system out by September 5th. That is very soon (in dog's years
) Even then, when we launch the new version for public Beta, not all data in DisBoards will be there. But it will eventually get to all of it.
"So say we all" [now, thats an UberGeek ending to a post]