Solr ElasticSearch
how-to ETL with MongoDB and PostgreSQL part-1 part-1-building-a-centralized-logging-application
collective.solr 可能沒有妥善處理 Two-Phase-Commits 問題
collective.solr provides a batteries included solution for integrating Plone and Solr. And React.js is the way to go for client side rendering. PLIP: Merge collective.solr into Core Documentation
Terminology
Node: Single Server that is part of a Cluster, a node joins a cluster named "elasticsearch" by default.
Index: Collection of Documents (Types), corresponds to a Database.
Type: Class/Category of similar Documents, consisting of a Name and a Mapping. Stored within a metadata field named _type because Lucene has no concept of document types.
Mapping: Schema
Solr
Plone 4.3.7 搭配下列 solr.cfg 和 develop.cfg 可順利安裝,相關模組是:
- collective.indexing = 2.0b1
- collective.recipe.solrinstance = 3
# solr.cfg [buildout] parts += solr-download solr-instance [settings] solr-host = 127.0.0.1 solr-port = 8983 solr-min-ram = 128M solr-max-ram = 256M [solr-download] recipe = hexagonit.recipe.download strip-top-level-dir = true url = http://ftp.tc.edu.tw/pub/Apache/lucene/solr/4.10.4/solr-4.10.4.zip [solr-instance] recipe = collective.recipe.solrinstance solr-location = ${solr-download:location} host = ${settings:solr-host} port = ${settings:solr-port} basepath = /solr # autoCommitMaxTime = 900000 max-num-results = 500 section-name = SOLR unique-key = UID logdir = ${buildout:directory}/var/solr default-search-field = default unique-key = UID solr-version = 4 default-operator = AND updateLog = true java_opts = -Dcom.sun.management.jmxremote -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.port=8984 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -server -Xms${settings:solr-min-ram} -Xmx${settings:solr-max-ram} index = name:allowedRolesAndUsers type:string stored:false multivalued:true name:created type:date stored:true name:Creator type:string stored:true name:Date type:date stored:true name:default type:text indexed:true stored:false multivalued:true name:Description type:text copyfield:default stored:true name:effective type:date stored:true name:exclude_from_nav type:boolean indexed:false stored:true name:expires type:date stored:true name:getIcon type:string indexed:false stored:true name:getId type:string indexed:false stored:true name:getRemoteUrl type:string indexed:false stored:true name:is_folderish type:boolean stored:true name:Language type:string stored:true name:modified type:date stored:true name:object_provides type:string stored:false multivalued:true name:path_depth type:integer indexed:true stored:false name:path_parents type:string indexed:true stored:false multivalued:true name:path_string type:string indexed:false stored:true name:portal_type type:string stored:true name:review_state type:string stored:true name:SearchableText type:text copyfield:default stored:false name:searchwords type:string stored:false multivalued:true name:showinsearch type:boolean stored:false name:Subject type:string copyfield:default stored:true multivalued:true name:Title type:text copyfield:default stored:true name:Type type:string stored:true name:UID type:string stored:true required:true
develop.cfg
extends = buildout.cfg solr.cfg eggs += Products.DocFinderTab plone.reload collective.solr
$ cd parts/solr-instance $ java -jar start.jar
collective.solr/configlet.py 想要 from plone.app.controlpanel.form import ControlPanelForm 在 Plone5 (使用 plone.app.controlpanel 3.0.3) 已沒有 form.py 檔案,最後出現在 plone.app.controlpanel 的版本是 3.0.2,處理方式是要用 z3c.form 改寫 Control Panel,也有新版試著相容 Plone 5 [1] [2],許多 Test Issue 形成阻礙。
遇到 'collection1' not available 和 '_version_' field must exist in schema 問題,似乎針對後者處理就行,在 parts/solr-instance/solr/collection1/conf/schema.xml 加一行:
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>
LiveSearch 會被影響,無法生效。無法正常執行的問題。相容問題 Plone5 相容。模組功能定位的討論。
Autocomplete 使用 GET 而非 POST Add Context to queryUtility
portal_type 要留意 Type Name 中間有空白值
複雜度過大造成 PloneIntranet 改用 collective.indexing
alm.solrindex GeoSpatial Search Filter lazycat patch
ElasticSearch
應用軟體可能要求不同的 ElasticSearch 版本,例如 Wagtail 1.6.3 應搭配 1.x (1.7.5) 版本。
Ubuntu 14.04 安裝說明: Java 可安裝 OpenJDK 或 Oracle JDK - 步驟範例
Tokenizing Stemming Filtering Scoring
Python and ElasticSearch #1 Setting Up #2 extended query #3 command-line utility
path index diffs in navigation portlet
Alternatives and Comparison
Building a Centralized Logging Application
Python 寫成的 Whoosh 極輕量,但中文功能支援陽春。
Xapian Apache tika
以上都能使用 Haystack 來整合。
Haystack
$ pip install haystack
# settings.py INSTALLED_APPS = ( haystack, )
執行 ./manage.py 可以看到 [haystack] 指令列表。
Haystack 提供 SearchQuerySet 功能類似 Django QuerySet 是一種抽象層,用來支援多樣後端服務,透過 ModelSearchIndex 可以只指定 Black List 或 White List。
Service API
Algolia: Python API papyrus example