some fixes and published strus web search article

author: Andreas Baumann <mail@andreasbaumann.cc> 2017-04-12 20:50:46 +0200
committer: Andreas Baumann <mail@andreasbaumann.cc> 2017-04-12 20:50:46 +0200
commit: 07a3e29f397b4a8bd5ca124a1d020e050d20e131 (patch)
tree: 349e0f0e71382168e1f9ac2d7bc4ff0f61ba348a
parent: 5ced26f343a43cb485c0f9cf76e64a5a76466fa7 (diff)
download: www-andreasbaumann-cc-07a3e29f397b4a8bd5ca124a1d020e050d20e131.tar.gz
www-andreasbaumann-cc-07a3e29f397b4a8bd5ca124a1d020e050d20e131.tar.bz2
3 files changed, 19 insertions, 20 deletions
diff --git a/content/blog/web-search-homepage.md b/content/blog/web-search-homepage.md
index e7083bb..3a24ce5 100644
--- a/content/blog/web-search-homepage.md
+++ b/content/blog/web-search-homepage.md
@@ -1,14 +1,13 @@
 +++
-draft = true
 title = "Web search for my homepage"
 date = "2017-04-12T15:49:11+01:00"
-categories = [ "Strus", "Search", "Information Retrieval" ]
-thumbnail = "/images/blog/web-search-homepage/strus.jpg"
+categories = [ "Strus", "Search", "Information Retrieval", "Web" ]
+thumbnail = "/images/blog/web-search-homepage/search.png"
 +++
 
 ## Intro
 I wanted to add a search function to my web page.
-As the website is build with Hugo as a set of static
+As the website is built with Hugo as a set of static
 HTML pages onto a read-only web server, standard
 approaches didn't work like a LIKE-query in Mysql
 as many CMS are implementing search.
@@ -19,16 +18,16 @@ project.
 
 The basic idea is that the author of the web pages
 can build a search index locally with the markdown version
-of his content and then push it to a webservice dedicated
+of his content and then push it to a web service dedicated
 to search only. Again, the files making up the search index 
 can be set to read-only after an update, leaving the system
 open to only DOSA or DDOSA (but which public system isn't).
 
 ## Installing strus for content indexing
 
-So, I installed the packages 'strusutilties' for ArchLinux
+So, I installed the 'strusutilities' package for ArchLinux
 on my local machine from the
-[OpenBuildService](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=strusutilities)
+[Open Build Service](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=strusutilities)
 with:
 
 ```
@@ -43,8 +42,9 @@ The command line tools consist of tools to analyze the document,
 apply some basic parsing and normalization of search terms.
 
 The tools take XML, JSON or TSV (tab-separated-values) currently.
-My Hugo documents have their metadata in TOML and the content in
-Markdown:
+My Hugo documents have their metadata in
+[TOML](https://en.wikipedia.org/wiki/TOML) and the content in
+[Markdown](https://de.wikipedia.org/wiki/Markdown):
 
 ```
 +++
@@ -55,7 +55,7 @@ thumbnail = "/images/blog/web-search-homepage/strus.jpg"
 +++
 
 I wanted to add a search function to my web page.
-As the website is build with Hugo as a set of static
+As the website is built with Hugo as a set of static
 ...
 ```
 
@@ -67,9 +67,9 @@ file using:
  * [pandoc](http://pandoc.org/): convert markdown to
    tons of formats
 
-I choose to convert to a DocBook style of XML and put all
-the posts into one big file called `posts.xml`. The metadata is
-embedded as a JSON value into the XML file in a tag `<meta>`.
+I choose to convert to a [DocBook](http://docbook.org/whatis) style
+of XML and put all the posts into one big file called `posts.xml`.
+The metadata is embedded as a JSON value into the XML file in a tag `<meta>`.
 
 The final XML file looks like:
 
@@ -90,7 +90,7 @@ The final XML file looks like:
     <body>
       <para>
         I wanted to add a search function to my
-        web page. As the website is build with
+        web page. As the website is built with
         Hugo as a set of static
 ...
 ```
@@ -108,10 +108,10 @@ I packaged this whole ugly conversion step into a script like that:
 
 ## Configuring the document analysis and indexing process
 
-Now we define the configuration for the text analysis. Basically
+Now we define the configuration for the text analysis. Basically,
 we tell the system where to split the document into retrievable
 items, which features we want to be able to search for and what
-attributes and text we want to show in the ranklist.
+attributes and text we want to show in the rank list.
 
 The file `document.ana` contains a configuration which describes
 how Strus should analyze and index the documents:
@@ -170,7 +170,7 @@ when presenting the hit in the ranlist.
 The forward index stores the document almost verbatim as a sequence
 of title and text tokens. So when we get a hit in a search result
 we can present a selection of them (usually a sentence containing
-the matches) in the ranklist.
+the matches) in the rank list.
 
 Finally, we need to count the number of words per document,
 this is needed or the retrieval function:
@@ -192,9 +192,8 @@ can copy to the server running the `strusWebService`.
 
 ## Installing the strusWebService for querying
 
-On a publicly available server I installed the 'strusWebService':
-
-[OpenBuildService](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=struswebservice)
+On a publicly available server I installed the 'strusWebService' package from the
+[Open Build Service](https://software.opensuse.org/download.html?project=home:andreas_baumann&package=struswebservice)
 with:
 
 ```
diff --git a/static/images/blog/web-search-homepage/search.png b/static/images/blog/web-search-homepage/search.png
new file mode 100644
index 0000000..77db107
--- /dev/null
+++ b/static/images/blog/web-search-homepage/search.png
diff --git a/static/images/blog/web-search-homepage/strus.jpg b/static/images/blog/web-search-homepage/strus.jpg
deleted file mode 100644
index 70c0776..0000000
--- a/static/images/blog/web-search-homepage/strus.jpg
+++ /dev/null
author	Andreas Baumann <mail@andreasbaumann.cc>	2017-04-12 20:50:46 +0200
committer	Andreas Baumann <mail@andreasbaumann.cc>	2017-04-12 20:50:46 +0200
commit	07a3e29f397b4a8bd5ca124a1d020e050d20e131 (patch)
tree	349e0f0e71382168e1f9ac2d7bc4ff0f61ba348a
parent	5ced26f343a43cb485c0f9cf76e64a5a76466fa7 (diff)
download	www-andreasbaumann-cc-07a3e29f397b4a8bd5ca124a1d020e050d20e131.tar.gz www-andreasbaumann-cc-07a3e29f397b4a8bd5ca124a1d020e050d20e131.tar.bz2