Mercator, the "Altavista" robot http://mercator.comm.nsdlib.org/ authors working for Microsoft now :-) Some Java roboter frameworks: heritrix crawler4j mainly dead or unusable: jspider websphinx A C++ web robot http://code.google.com/p/whalebot/ Javascript support phantomjs http://code.google.com/p/phantomjs/ https://github.com/mikeal/spider https://github.com/joshfire/node-crawler Php http://www.makeuseof.com/tag/build-basic-web-crawler-pull-information-website/ Streams http://www.mr-edd.co.uk/blog/beginners_guide_streambuf http://www.codeproject.com/Articles/4457/zipstream-bzip2stream-iostream-wrappers-for-the-zl Lua embedding http://www.ibm.com/developerworks/linux/library/l-embed-lua/ Loadable modules in C++ http://www.isotton.com/devel/docs/C++-dlopen-mini-HOWTO/C++-dlopen-mini-HOWTO.html http://www.linuxjournal.com/article/3687?page=0,1 http://www.artima.com/cppsource/subscription_problem.html http://kristiannielsen.livejournal.com/11783.html http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html: for singleton/issues on Windows and how to solve them in an elegant way Meta Programming in C++ Model C++ Design (Alexandrescu) The Loki Template library http://www.codeproject.com/Articles/5629/Tiny-Template-Library-implementing-typelist http://www.drdobbs.com/cpp/extracting-function-parameter-and-return/240000586?pgno=2 http://sourceforge.net/projects/toast/: portable type_info.name() http://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html http://tombarta.wordpress.com/category/gcc/ ?? name of module or typeid of derived class in module? http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Name-lookup.html#Name-lookup file type detection http://sourceforge.net/projects/libmagic/: classic for Unix, not for Windows network access libraries libCurl libfetch WinHTTP http://www.pcs.cnu.edu/~dgame/sockets/socketsC++/sockets.html HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug "C:\debuggers\windbg.exe" -p %ld -e %ld -g DLL and shared globals (for Singleton): - exported data segments #pragma data_seg("SHARED") #pragma data_seg() #pragma comment(linker, "/section:SHARED,RWS") - http://support.microsoft.com/kb/168958/en-us - http://www.xing.com/net/moderncpp/allgemeines-zu-c-101269/singletons-in-dll-s-32375749 - http://stackoverflow.com/questions/1880052/c-duplicated-static-member - http://stackoverflow.com/questions/4911994/sharing-a-global-static-variable-between-a-process-and-dll - http://www.ogre3d.org/forums/viewtopic.php?p=75622&sid=ce193664e1d3d7c4af509e6f4e2718c6 - http://www.lurklurk.org/linkers/linkers.html#wincircular Singleton design: - http://www.oop-trainer.de/Themen/Singleton.html Linking with gcc and visibility: - http://gcc.gnu.org/wiki/Visibility Robots.txt: - http://www.nextthing.org/archives/2007/03/12/robotstxt-adventure - https://github.com/seomoz/reppy: in Python, but as source of inspiration quite nice Service for Crawling: - http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ - http://commoncrawl.org/ simplicity of Java cralwer: http://www.javacodegeeks.com/2013/07/mini-search-engine-just-the-basics-using-neo4j-crawler4j-graphstream-and-encog.html Lua C++ binding by hand: http://loadcode.blogspot.ch/2007/02/wrapping-c-classes-in-lua.html http://rubenlaguna.com/wp/2012/11/26/first-steps-lua-cplusplus-integration/ example of game programming in Lua, gives maybe a high-level insight on how to write cralwer.conf: shared pointer: no circular references! module create funtions virtual constructor