Skip to content. | Skip to navigation

Personal tools

Navigation

You are here: Home / Tips / Solr ElasticSearch

Solr ElasticSearch

進階搜尋引擎功能,包括外部搜尋服務的整合。

collective.solr 可能沒有妥善處理 Two-Phase-Commits 問題

collective.solr provides a batteries included solution for integrating Plone and Solr. And React.js is the way to go for client side rendering. PLIP: Merge collective.solr into Core Documentation

Terminology

Node: Single Server that is part of a Cluster, a node joins a cluster named "elasticsearch" by default.

Index: Collection of Documents (Types), corresponds to a Database.

Type: Class/Category of similar Documents, consisting of a Name and a Mapping. Stored within a metadata field named _type because Lucene has no concept of document types.

Mapping: Schema

Solr

Install Solr on Ubuntu

Plone 4.3.7 搭配下列 solr.cfg 和 develop.cfg 可順利安裝,相關模組是:

  • collective.indexing = 2.0b1
  • collective.recipe.solrinstance = 3
# solr.cfg

[buildout]
parts +=
    solr-download
    solr-instance


[settings]
solr-host = 127.0.0.1
solr-port = 8983
solr-min-ram = 128M
solr-max-ram = 256M


[solr-download]
recipe = hexagonit.recipe.download
strip-top-level-dir = true
url = http://ftp.tc.edu.tw/pub/Apache/lucene/solr/4.10.4/solr-4.10.4.zip


[solr-instance]
recipe = collective.recipe.solrinstance
solr-location = ${solr-download:location}
host = ${settings:solr-host}
port = ${settings:solr-port}
basepath = /solr
# autoCommitMaxTime = 900000
max-num-results = 500
section-name = SOLR
unique-key = UID
logdir = ${buildout:directory}/var/solr
default-search-field = default
unique-key = UID
solr-version = 4
default-operator = AND
updateLog = true
java_opts =
  -Dcom.sun.management.jmxremote
  -Djava.rmi.server.hostname=127.0.0.1
  -Dcom.sun.management.jmxremote.port=8984
  -Dcom.sun.management.jmxremote.ssl=false
  -Dcom.sun.management.jmxremote.authenticate=false
  -server
  -Xms${settings:solr-min-ram}
  -Xmx${settings:solr-max-ram}
index =
    name:allowedRolesAndUsers   type:string stored:false multivalued:true
    name:created                type:date stored:true
    name:Creator                type:string stored:true
    name:Date                   type:date stored:true
    name:default                type:text indexed:true stored:false multivalued:true
    name:Description            type:text copyfield:default stored:true
    name:effective              type:date stored:true
    name:exclude_from_nav       type:boolean indexed:false stored:true
    name:expires                type:date stored:true
    name:getIcon                type:string indexed:false stored:true
    name:getId                  type:string indexed:false stored:true
    name:getRemoteUrl           type:string indexed:false stored:true
    name:is_folderish           type:boolean stored:true
    name:Language               type:string stored:true
    name:modified               type:date stored:true
    name:object_provides        type:string stored:false multivalued:true
    name:path_depth             type:integer indexed:true stored:false
    name:path_parents           type:string indexed:true stored:false multivalued:true
    name:path_string            type:string indexed:false stored:true
    name:portal_type            type:string stored:true
    name:review_state           type:string stored:true
    name:SearchableText         type:text copyfield:default stored:false
    name:searchwords            type:string stored:false multivalued:true
    name:showinsearch           type:boolean stored:false
    name:Subject                type:string copyfield:default stored:true multivalued:true
    name:Title                  type:text copyfield:default stored:true
    name:Type                   type:string stored:true
    name:UID                    type:string stored:true required:true

develop.cfg

extends =
    buildout.cfg
    solr.cfg

eggs +=
    Products.DocFinderTab
    plone.reload
    collective.solr
$ cd parts/solr-instance
$ java -jar start.jar

collective.solr/configlet.py 想要 from plone.app.controlpanel.form import ControlPanelForm 在 Plone5 (使用 plone.app.controlpanel 3.0.3) 已沒有 form.py 檔案,最後出現在 plone.app.controlpanel 的版本是 3.0.2,處理方式是要用 z3c.form 改寫 Control Panel,也有新版試著相容 Plone 5 [1] [2],許多 Test Issue 形成阻礙。

遇到 'collection1' not available'_version_' field must exist in schema 問題,似乎針對後者處理就行,在 parts/solr-instance/solr/collection1/conf/schema.xml 加一行:

<field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>

LiveSearch 會被影響,無法生效。無法正常執行的問題相容問題 Plone5 相容模組功能定位的討論

Autocomplete 使用 GET 而非 POST Add Context to queryUtility

portal_type 要留意 Type Name 中間有空白值

複雜度過大造成 PloneIntranet 改用 collective.indexing

alm.solrindex GeoSpatial Search Filter lazycat patch

Fancy Custom Search Page

Host Setup 分散式架構議題

ElasticSearch

應用軟體可能要求不同的 ElasticSearch 版本,例如 Wagtail 1.6.3 應搭配 1.x (1.7.5) 版本

Ubuntu 14.04 安裝說明: Java 可安裝 OpenJDK 或 Oracle JDK - 步驟範例

Tokenizing Stemming Filtering Scoring

collective.elasticsearch

Python and ElasticSearch #1 Setting Up #2 extended query #3 command-line utility

path index diffs in navigation portlet

Alternatives and Comparison

Python 寫成的 Whoosh 極輕量,但中文功能支援陽春。

Xapian

以上都能使用 Haystack 來整合。

Solr vs ElasticSearch

Haystack

Django + whoosh

$ pip install haystack
# settings.py
INSTALLED_APPS = (
   haystack,
)

執行 ./manage.py 可以看到 [haystack] 指令列表。

Haystack 提供 SearchQuerySet 功能類似 Django QuerySet 是一種抽象層,用來支援多樣後端服務,透過 ModelSearchIndex 可以只指定 Black List 或 White List。

Service API

Algolia: Python API papyrus example

indexing process + map portal to URLs

Google Search

在兩個網站共享內容項目: Collection 顯示另一站的項目