1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
|
[1]<- Previous [2]Home [3]Next ->
PF Limits in OpenBSD
This article documents one of several insidious little gotchas I've
encountered using OpenBSD systems in a core-router/firewall capacity in
lieu of Cisco 2851 or Juniper j4350 class hardware. Specifically,
various hard memory limits built into PF, which, when encountered,
cause PF to stop accepting new connections.
Incidently, [4]here is the story of how I wound up replacing the
preponderant quantity of my networking gear with openBSD and saved
metric-oodles of coinage.
Anyway, the upshot is that if you use OpenBSD with PF in a production
environment and you aren't aware of PF's memory limit (especially the
state-related memory limits), you have a ticking time-bomb on your
network. Just FYI.
I'd been playing with OpenBSD for fun, in low-budget side projects, and
non-prod environments for years before that fateful day that I ran into
the state-table limit like a brick wall.
It was shortly after I'd replaced the cisco-based core routing
infrastructure of our Headquarters building with OpenBSD. It presented
as a sort of network "glitch". You know, the unexplainable little
connectivity loss that only affects one user. Probably his cable, or
wall socket. But then it was two or three users, and then it was a user
whose connectivity was working fine, except for he couldn't create new
ssh connections suddenly (wha?). It was gone as quickly as it appeared,
and never seemed to adhere to any sort of consistent set of symptoms.
It was quite maddening.
At some point I noticed that if I was quick enough, I could catch a "no
route to host" error message from PF on the console of the core
routers, and that's when I really started looking at them in earnest.
It turns out, as I've already said, the kernel keeps memory set aside
for PF to do things like create state tables and state table entries.
In my case I was hitting the limit on the total number of states PF was
allowed to track at once. This meant that new connections would fail
with no route to host until some other state expired and made room for
the new one. This looked downright wierd troublshooting from the
outside because protocols like HTTP (which are stateless) would still
work pretty well, while others like SSH (which requires a constant
connection) were more likely to have problems.
You can see the default sizes of these limits using pfctl -sm:
# pfctl -sm
states hard limit 10000
src-nodes hard limit 10000
frags hard limit 5000
tables hard limit 1000
table-entries hard limit 200000
These are pretty sane defaults for most people who are running OpenBSD
routers, which is to say, nerds who have wedged it on to their
[5]soekris board or the wrt54 they found at the second-hand store, or
the 8086 they found under the sink in their dad's house.
If you're running production routers on real hardware you're going to
want to raise those a bit. And by `bit' I mean like two orders of
magnitude. Do this with a line in your pf.conf that looks something
like this:
set limit { states 1000000, frags 1000000, src-nodes 100000, tables
1000000, table-entries 1000000 }
You can check to see if you've ever hit one of these limits with pfctl
-si, which displayes the values for a whole bunch of couters tracked by
PF:
[dave@a][~]--> sudo pfctl -si
Status: Enabled for 686 days 01:20:03 Debug: err
State Table Total Rate
current entries 39401
searches 587674569722 9914.3/s
inserts 23981800145 404.6/s
removals 23981760744 404.6/s
Counters
match 24166482278 407.7/s
bad-offset 0 0.0/s
fragment 0 0.0/s
short 0 0.0/s
normalize 1282 0.0/s
memory 0 0.0/s
bad-timestamp 0 0.0/s
congestion 204 0.0/s
ip-option 433656 0.0/s
proto-cksum 0 0.0/s
state-mismatch 135709 0.0/s
state-insert 0 0.0/s
state-limit 0 0.0/s
src-limit 0 0.0/s
synproxy 0 0.0/s
If you have RRDTool installed, you can use this shell script to push
some of these values into an RRD (or repurpose it to feed collectd or
gmond or whatever):
#!/usr/local/bin/bash
gawk="/usr/local/bin/gawk"
pfctl="/sbin/pfctl"
rrdtool="/usr/local/bin/rrdtool"
RRDHOME='/home/pcap/rrd'
pfctl_info() {
local output=$($pfctl -si 2>&1)
local temp=$(echo "$output" | $gawk '
BEGIN {BytesIn=0; BytesOut=0; PktsInPass=0; PktsInBlock=0; \
PktsOutPass=0; PktsOutBlock=0; States=0; StateSearchs=0; \
StateInserts=0; StateRemovals=0}
/Bytes In/ { BytesIn = $3 }
/Bytes Out/ { BytesOut = $3 }
/Packets In/ { getline;PktsInPass = $2 }
/Passed/ { getline;PktsInBlock = $2 }
/Packets Out/ { getline;PktsOutPass = $2 }
/Passed/ { getline;PktsOutBlock = $2 }
/current entries/ { States = $3 }
/searches/ { StateSearchs = $2 }
/inserts/ { StateInserts = $2 }
/removals/ { StateRemovals = $2 }
END {print BytesIn ":" BytesOut ":" PktsInPass ":" \
PktsInBlock ":" PktsOutPass ":" PktsOutBlock ":" \
States ":" StateSearchs ":" StateInserts ":" StateRemovals}
')
RETURN_VALUE=$temp
}
### collect the data
pfctl_info
### update the database
$rrdtool update ${RRDHOME}/pf_stats_db.rrd --template BytesIn:BytesOut:PktsInPas
s:PktsInBlock:PktsOutPass:PktsOutBlock:States:StateSearchs:StateInserts:StateRem
ovals N:$RETURN_VALUE
And then use the following to draw graphs from it:
#!/bin/sh
RRDHOME='/home/pcap/rrd'
cd ${RRDHOME}
#####
######## pf state rate graph
/usr/local/bin/rrdtool graph pf_stats_states.png \
-w 785 -h 151 -a PNG \
--slope-mode \
--start -86400 --end now \
--font DEFAULT:7: \
--title "pf state rate" \
--watermark "`date`" \
--vertical-label "states/sec" \
--right-axis-label "searches/sec" \
--right-axis 100:0 \
--x-grid MINUTE:10:HOUR:1:MINUTE:120:0:%R \
--alt-y-grid --rigid \
DEF:StateInserts=pf_stats_db.rrd:StateInserts:MAX \
DEF:StateRemovals=pf_stats_db.rrd:StateRemovals:MAX \
DEF:StateSearchs=pf_stats_db.rrd:StateSearchs:MAX \
CDEF:scaled_StateSearchs=StateSearchs,0.01,* \
DEF:States=pf_stats_db.rrd:States:MAX \
CDEF:scaled_States=States,0.01,* \
AREA:StateInserts#33CC33:"inserts" \
GPRINT:StateInserts:LAST:"Cur\: %5.2lf" \
GPRINT:StateInserts:AVERAGE:"Avg\: %5.2lf" \
GPRINT:StateInserts:MAX:"Max\: %5.2lf" \
GPRINT:StateInserts:MIN:"Min\: %5.2lf\t\t" \
LINE1:scaled_StateSearchs#FF0000:"searches" \
GPRINT:StateSearchs:LAST:"Cur\: %5.2lf" \
GPRINT:StateSearchs:AVERAGE:"Avg\: %5.2lf" \
GPRINT:StateSearchs:MAX:"Max\: %5.2lf" \
GPRINT:StateSearchs:MIN:"Min\: %5.2lf\n" \
LINE1:StateRemovals#0000CC:"removal" \
GPRINT:StateRemovals:LAST:"Cur\: %5.2lf" \
GPRINT:StateRemovals:AVERAGE:"Avg\: %5.2lf" \
GPRINT:StateRemovals:MAX:"Max\: %5.2lf" \
GPRINT:StateRemovals:MIN:"Min\: %5.2lf\t\t" \
LINE1:scaled_States#00F0F0:"States" \
GPRINT:States:LAST:"Cur\: %5.2lf" \
GPRINT:States:AVERAGE:"Avg\: %5.2lf" \
GPRINT:States:MAX:"Max\: %5.2lf" \
GPRINT:States:MIN:"Min\: %5.2lf\n"
which will yield you a pretty, two-axis graph like the one below which
should help you avoid limits in the future.
PF State Graph
[6]<- Previous [7]Home [8]Next ->
References
1. http://www.skeptech.org/blog/2013/01/13/unscrewed-a-story-about-openbsd/
2. http://www.skeptech.org/
3. http://www.skeptech.org/blog/2013/03/28/unhappy-bean-factory/
4. http://www.skeptech.org/blog/2013/01/13/unscrewed-a-story-about-openbsd/
5. http://soekris.com/
6. http://www.skeptech.org/blog/2013/01/13/unscrewed-a-story-about-openbsd/
7. http://www.skeptech.org/
8. http://www.skeptech.org/blog/2013/03/28/unhappy-bean-factory/
|