summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--content/blog/web-search-homepage.md39
-rw-r--r--static/images/blog/web-search-homepage/search.pngbin0 -> 83416 bytes
-rw-r--r--static/images/blog/web-search-homepage/strus.jpgbin36406 -> 0 bytes
3 files changed, 19 insertions, 20 deletions
diff --git a/content/blog/web-search-homepage.md b/content/blog/web-search-homepage.md
index e7083bb..3a24ce5 100644
--- a/content/blog/web-search-homepage.md
+++ b/content/blog/web-search-homepage.md
@@ -1,14 +1,13 @@
+++
-draft = true
title = "Web search for my homepage"
date = "2017-04-12T15:49:11+01:00"
-categories = [ "Strus", "Search", "Information Retrieval" ]
-thumbnail = "/images/blog/web-search-homepage/strus.jpg"
+categories = [ "Strus", "Search", "Information Retrieval", "Web" ]
+thumbnail = "/images/blog/web-search-homepage/search.png"
+++
## Intro
I wanted to add a search function to my web page.
-As the website is build with Hugo as a set of static
+As the website is built with Hugo as a set of static
HTML pages onto a read-only web server, standard
approaches didn't work like a LIKE-query in Mysql
as many CMS are implementing search.
@@ -19,16 +18,16 @@ project.
The basic idea is that the author of the web pages
can build a search index locally with the markdown version
-of his content and then push it to a webservice dedicated
+of his content and then push it to a web service dedicated
to search only. Again, the files making up the search index
can be set to read-only after an update, leaving the system
open to only DOSA or DDOSA (but which public system isn't).
## Installing strus for content indexing
-So, I installed the packages 'strusutilties' for ArchLinux
+So, I installed the 'strusutilities' package for ArchLinux
on my local machine from the
-[OpenBuildService](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=strusutilities)
+[Open Build Service](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=strusutilities)
with:
```
@@ -43,8 +42,9 @@ The command line tools consist of tools to analyze the document,
apply some basic parsing and normalization of search terms.
The tools take XML, JSON or TSV (tab-separated-values) currently.
-My Hugo documents have their metadata in TOML and the content in
-Markdown:
+My Hugo documents have their metadata in
+[TOML](https://en.wikipedia.org/wiki/TOML) and the content in
+[Markdown](https://de.wikipedia.org/wiki/Markdown):
```
+++
@@ -55,7 +55,7 @@ thumbnail = "/images/blog/web-search-homepage/strus.jpg"
+++
I wanted to add a search function to my web page.
-As the website is build with Hugo as a set of static
+As the website is built with Hugo as a set of static
...
```
@@ -67,9 +67,9 @@ file using:
* [pandoc](http://pandoc.org/): convert markdown to
tons of formats
-I choose to convert to a DocBook style of XML and put all
-the posts into one big file called `posts.xml`. The metadata is
-embedded as a JSON value into the XML file in a tag `<meta>`.
+I choose to convert to a [DocBook](http://docbook.org/whatis) style
+of XML and put all the posts into one big file called `posts.xml`.
+The metadata is embedded as a JSON value into the XML file in a tag `<meta>`.
The final XML file looks like:
@@ -90,7 +90,7 @@ The final XML file looks like:
<body>
<para>
I wanted to add a search function to my
- web page. As the website is build with
+ web page. As the website is built with
Hugo as a set of static
...
```
@@ -108,10 +108,10 @@ I packaged this whole ugly conversion step into a script like that:
## Configuring the document analysis and indexing process
-Now we define the configuration for the text analysis. Basically
+Now we define the configuration for the text analysis. Basically,
we tell the system where to split the document into retrievable
items, which features we want to be able to search for and what
-attributes and text we want to show in the ranklist.
+attributes and text we want to show in the rank list.
The file `document.ana` contains a configuration which describes
how Strus should analyze and index the documents:
@@ -170,7 +170,7 @@ when presenting the hit in the ranlist.
The forward index stores the document almost verbatim as a sequence
of title and text tokens. So when we get a hit in a search result
we can present a selection of them (usually a sentence containing
-the matches) in the ranklist.
+the matches) in the rank list.
Finally, we need to count the number of words per document,
this is needed or the retrieval function:
@@ -192,9 +192,8 @@ can copy to the server running the `strusWebService`.
## Installing the strusWebService for querying
-On a publicly available server I installed the 'strusWebService':
-
-[OpenBuildService](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=struswebservice)
+On a publicly available server I installed the 'strusWebService' package from the
+[Open Build Service](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=struswebservice)
with:
```
diff --git a/static/images/blog/web-search-homepage/search.png b/static/images/blog/web-search-homepage/search.png
new file mode 100644
index 0000000..77db107
--- /dev/null
+++ b/static/images/blog/web-search-homepage/search.png
Binary files differ
diff --git a/static/images/blog/web-search-homepage/strus.jpg b/static/images/blog/web-search-homepage/strus.jpg
deleted file mode 100644
index 70c0776..0000000
--- a/static/images/blog/web-search-homepage/strus.jpg
+++ /dev/null
Binary files differ