summaryrefslogtreecommitdiff
path: root/docs/LINKS
blob: 4a24075459daa3c04493b3f9d66afe26f41ec932 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
Mercator, the "Altavista" robot

http://mercator.comm.nsdlib.org/

authors working for Microsoft now :-)

Some Java roboter frameworks:

heritrix
crawler4j

mainly dead or unusable:

jspider
websphinx

A C++ web robot

http://code.google.com/p/whalebot/

Javascript support

phantomjs http://code.google.com/p/phantomjs/
https://github.com/mikeal/spider
https://github.com/joshfire/node-crawler

Php

http://www.makeuseof.com/tag/build-basic-web-crawler-pull-information-website/

Streams

http://www.mr-edd.co.uk/blog/beginners_guide_streambuf
http://www.codeproject.com/Articles/4457/zipstream-bzip2stream-iostream-wrappers-for-the-zl

Lua embedding

http://www.ibm.com/developerworks/linux/library/l-embed-lua/

Loadable modules in C++

http://www.isotton.com/devel/docs/C++-dlopen-mini-HOWTO/C++-dlopen-mini-HOWTO.html
http://www.linuxjournal.com/article/3687?page=0,1
http://www.artima.com/cppsource/subscription_problem.html
http://kristiannielsen.livejournal.com/11783.html
http://www.yolinux.com/TUTORIALS/LibraryArchives-StaticAndDynamic.html:
for singleton/issues on Windows and how to solve them in an elegant way

Meta Programming in C++

Model C++ Design (Alexandrescu)
The Loki Template library
http://www.codeproject.com/Articles/5629/Tiny-Template-Library-implementing-typelist
http://www.drdobbs.com/cpp/extracting-function-parameter-and-return/240000586?pgno=2
http://sourceforge.net/projects/toast/: portable type_info.name()
http://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html
http://tombarta.wordpress.com/category/gcc/
?? name of module or typeid of derived class in module?
http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Name-lookup.html#Name-lookup

file type detection

http://sourceforge.net/projects/libmagic/: classic for Unix, not for Windows

network access libraries

libCurl
libfetch
WinHTTP
http://www.pcs.cnu.edu/~dgame/sockets/socketsC++/sockets.html

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug
"C:\debuggers\windbg.exe" -p %ld -e %ld -g

DLL and shared globals (for Singleton):
- exported data segments
  #pragma data_seg("SHARED")
  #pragma data_seg()
  #pragma comment(linker, "/section:SHARED,RWS")
- http://support.microsoft.com/kb/168958/en-us
- http://www.xing.com/net/moderncpp/allgemeines-zu-c-101269/singletons-in-dll-s-32375749
- http://stackoverflow.com/questions/1880052/c-duplicated-static-member
- http://stackoverflow.com/questions/4911994/sharing-a-global-static-variable-between-a-process-and-dll
- http://www.ogre3d.org/forums/viewtopic.php?p=75622&sid=ce193664e1d3d7c4af509e6f4e2718c6
- http://www.lurklurk.org/linkers/linkers.html#wincircular

Singleton design:
- http://www.oop-trainer.de/Themen/Singleton.html

Linking with gcc and visibility:
- http://gcc.gnu.org/wiki/Visibility

Robots.txt:
- http://www.nextthing.org/archives/2007/03/12/robotstxt-adventure
- https://github.com/seomoz/reppy: in Python, but as source of inspiration quite nice

Service for Crawling:
- http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/
- http://commoncrawl.org/