summaryrefslogtreecommitdiff
path: root/content/software/luceneanalyzer.md
blob: 0728969c9cf430590dd36a44863beb32f69b83da (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
+++
title = "Lucene Index Dumper"
description = "a mini-contribution to [Lucene](http://lucene.apache.org/)"
+++

LuceneAnalyzer is a quick hack for dumping and inspecting a Lucene index. Something for the 'sort-uniq-cut-awk' guys out there. :-)

*   release 0.0.4 (for Lucene 3.1)
    *   [binaries, version 0.0.4](/luceneanalyzer/luceneanalyzer-0.0.4.tgz)
    *   [sources, version 0.0.4](/luceneanalyzer/luceneanalyzer-0.0.4-src.tgz)
*   release 0.0.3 (for Lucene 2.x)
    *   [binaries, version 0.0.3](/luceneanalyzer/luceneanalyzer-0.0.3.tgz)
    *   [sources, version 0.0.3](/luceneanalyzer/luceneanalyzer-0.0.3-src.tgz)

Show global statistics of the index:

```
shell> ./luceneanalyzer -g /dir_to_some_lucene_index

Global Information:
===================
        number of documents: 17
        total number of features: 955
        total number of tokens: 1442
        version: 1328361447856
        still current: true
        maximal document number: 17
        has deletions: false
```

Show field information:

```
shell> ./luceneanalyzer -f /dir_to_some_lucene_index

Field Information:
==================
Fields of type 'ALL':
        store_0_coordinate
        text
...
Fields of type 'INDEXED_WITH_TERMVECTOR':
        includes
Fields of type 'TERMVECTOR':
Fields of type 'TERMVECTOR_WITH_OFFSET':
Fields of type 'TERMVECTOR_WITH_POSITION':
Fields of type 'TERMVECTOR_WITH_POSITION_OFFSET':
        includes
Fields of type 'UNINDEXED':
        store
```

Show information about terms, statistics and positions:

```
shell> ./luceneanalyzer -t -vv /dir_to_some_lucene_index

Terms:
======
cat     camera  12[0]
cat     connector       3[0],4[0]
cat     copier  11[0]
cat     electronics     1[0],2[0],3[0],4[0],5[0],6[0],7[0],8[0],9[0],10[0],11[0],12[0],15[0],16[0]
...
ext    using   13[415]
text    utf     14[3]
text    v       8[2]
text    va902b  9[1]
text    valueselect     7[1]
```

A Git repository is accessible at **git://github.com/andreasbaumann/LuceneAnalyzer.git** (or at [https://github.com/andreasbaumann/LuceneAnalyzer/](https://github.com/andreasbaumann/LuceneAnalyzer/))

In case of questions, contact me at <mail at andreasbaumann dot cc>.