Blogs

Deep-Dive of Search in SharePoint 2013, Office 365 and SharePoint Online "From the Trenches"

By Errin O'Connor posted 06-17-2014 22:21

  

Deep-Dive of Search in SharePoint 2013, Office 365 and SharePoint Online "From the Trenches"

 

In this article, I will cover key service applications and services that power SharePoint 2013, Office 365 and SharePoint Online's search to enable your organization’s data to easily be found on-demand as well to enable the accuracy of your search results.

SharePoint 2013’s search applications and related services are as follows:

■ SharePoint Server Search service

■ This service is responsible for crawling content, as shown in the image below, on your organizations search index and is automatically started on all servers that run search topology components. This service is unique as it cannot be stopped or started from the Services on Server page.

■ Search Query and Site Settings service

■ This service load balances queries within the search topology and runs on the server(s) that run the query processing components. This service detects farm-level changes to the search service and stores them in the Search Admin database. This service automatically starts on all servers that are running the query processing component.

■ Search Host Controller service

■ This service manages the overall search topology components and is automatically started on all services that are running topology components.

The terms “Federated Search” and/or “Scopes” are now referred to as “Result Sources” in SharePoint 2013 and Office 365. SharePoint 2013’s Search uses a componentized model that is based on a Shared Services architecture.

A Search Service Application and proxy is provisioned and the Search Admin page is then accessed via Central Administration in SharePoint Server 2013 or for SharePoint Online via the Office 365 Admin Center under the SharePoint tab in the SharePoint admin center.

In order for search to properly work, the SharePoint 2013 Search service must configure a default crawl account which is also referred to as the default content access account. This account must be an active, Active Directory Domain Services domain account.

This account should not be setup as an individual or a specific person in IT as EPC Group has seen SharePoint search issues caused by this account being deactivated and an entire organization’s SharePoint search cease to work until the account issue was resolved.

As you become more experienced with SharePoint’s 2013 search, EPC Group recommends that you learn more around how to construct Keyword Query Language (KQL) queries for Search as there is an entire syntax reference library you can utilizes that may come in handy down the road as you become an expert in SharePoint 2013’s search.

Underlying Search Components of SharePoint 2013 and Office 365

The underlying architecture of SharePoint Search 2013 consists of 6 main components as follows:

■ Crawl Component 

■ Content Process Component 

■ Analytics Processing Component 

■ Admin Component 

■ Index Component 

■ Query Processing Component

When looking at SharePoint 2013’s search capabilities and how it actually crawls content, it is important to understand that there is a two part process and two related components that drive this activity.

The crawl component as well as the content processing component work together as the crawl component fetches the actual data and then those crawled items are sent over to the content processing component to extract links, metadata and other rich information.

There is a continuous crawls option in SharePoint 2013 that frequently reviews sites to ensure the search results are up-to-date and these can be configured based upon your organization’s needs as well as any performance restrictions as well as the overall amount of content that is being crawled.

In order to set a crawl, you must first have a content source for which you would like this service to review and return results to you and your organization’s users.

By default, when a Search service application is created, a content source named “Local SharePoint sites” is automatically created and starts crawling all SharePoint sites within the local SharePoint server farm.

Users will only be able to see search results for which their permissions allow them to see due to SharePoint’s search being security trimmed, but this default content source “Local SharePoint sites” does automatically put all of your SharePoint sites in-scope for search when SharePoint service application is originally configured.

Overview of SharePoint 2013's Crawl Component

 

 

Note: Microsoft is planning to release a new product by the name of Oslo whose goals is to tie together search silos into once centralized manner. To watch a short 2-3 min. video on Oslo, click here.

Crawl Component of SharePoint 2013’s Search

SharePoint 2013’s Search contains a crawl component which is executed via MSSearch.exe and crawled items are sent over to the Content Processing Component for further processing before being finally routed over to the index component.

SharePoint 2013’s crawl components consist of:

■ Out-of-the-Box connectors

The following connectors are available out-of-the-box in SharePoint 2013:

■ SharePoint

■ HTTP

■ File Share

■ BDC – also includes these other connectors that are built on BDC framework:

■ Exchange Public Folders

■ Lotus Notes

■ Documentum Connector

■ Taxonomy Connector

■ Requires the Term Store to be provisioned for crawling

■ People Profile Connector

Note: This requires the profile store to be deployed and configured

■ Features that are extensible through BCS

■ Local disk cache

■ Crawled items tracked in crawl database

■ The crawl database is used by the crawl component to store information about crawled items and to track crawl history. The crawl database also holds information such as the last crawl time, the last crawl ID and the type of update during the last crawl.

■ Configurations stored in Admin database

■ The Admin database contains information regarding the crawl servers via their synchronized registry and corresponding information regarding content sources and schedules

■ Crawl modes

■ Full Crawl

■ Incremental Crawl

■ Continuous Crawl

These crawls crawl the various content sources and provide back both the crawled items in terms of their actual content as well as their associated metadata which is then routed to the content processing component.

Measuring SharePoint 2013’s Search Performance

When measuring the performance of SharePoint 2013’s crawl components, it is important to review how your environment reacts in terms of high CPU utilization as the CPU load will rises in conjunction with the number of documents crawled per second.

You should also monitor the corresponding network and disk load for any possible bottlenecks that may cause performance degradation during a crawl as the network load is generated as the content is downloaded by the crawler from the hosts.

Disk load, on the other hand, is generated when items are temporarily stored during the crawl for these crawled items for processing by the Content Processing Component.

Content Processing Component of SharePoint 2013’s Search

SharePoint 2013 Search’s content processing component, as shown in the image below, receives the crawled content from the crawl component and performs document parsing, link extraction, metadata and property mappings.

Once items are processed, they are sent over to SharePoint Search’s index component to be indexed.

 

SharePoint 2013’s content processing component consist of the:

■ Analyses of content for indexing

■ Overview processing flow

■ Available dictionaries

■ Stateless node for SharePoint

■ Schema mapping components

■ Ability to stores links and anchors in the Link database for analytics

■ Additional extensible capabilities, as shown in the image below, through web service call-outs

■ The configurations stored in admin database

The Content Processing component transforms crawled items into artifacts that can be included in the search index by parsing document and property mappings.

The Content Processing component also perform linguistic processing or language detection at the time of index. In SharePoint 2013, this component writes information about links and URLs directly to the Link database as well as generates phonetic name variations for SharePoint 2013’s people search.

There are also capabilities to enable a content web service callout to enrich data before an item is added to the index via the extensibility capability which includes working with managed properties that can be provided to and from a web service.

SharePoint 2013’s Additional Extensibility Capabilities

You can create additional content sources in SharePoint 2013’s Central Administration as well as edit or delete any existing at any time.

You can also delete items from the search index or from search results in SharePoint Server 2013.

In Office 365 / SharePoint Online's Search administration, as shown in the image below, you can perform actions such as the creation of a new result source or manage an existing resource source as well a number of other administrative search functions.

Office 365 \ SharePoint Online Search Administration

 

In SharePoint 2013’s Search, a crawl component will automatically communicate with all crawl databases within the corresponding farm and there is no need to map a crawl component to specific crawl database as was required in previous version of SharePoint.

Analytics Processing Component of SharePoint 2013’s Search

In SharePoint 2013’s search the analytics are now performed by the analytics processing component during a crawl within the Search Service Application.

The new analytics processing component utilizes both the links database and analytic reporting database which improves overall speed and accuracy.

The Analytics Processing components in SharePoint 2013’s search contain core features such as:

■ Search Analytics

■ The map-reduce feature

■ Ability to learn by usage

■ The search analytics component not only analyzes the crawled items but how users actually interact over time with the search results

■ Overall usage analytics to include previous views stored in the event store

■ Enriching the index by updating index items

■ Usage reports in Analytics reporting database

This feature has the capability to analyze the action a user performs (views a page) and then collect the data regarding the event in the relevant usage files and publish them into the event store where they are stored and processed to enable the system to familiarize itself with “learned behavior.”

The Analytics Processing component routes the results to the Content Processing Component for it to be included in the search index.

You are also able to utilize its additional extensibility capabilities to develop code to handle custom events as well as scale the component to meet the underlying requirements and usage of your organization. You are able to:

■ Add additional Analytics Processing roles for faster analysis

■ Add additional Link databases to increase capacity for links as well as user search clicks

■ Add additional reporting databases to scale to meet your reporting needs as well as to improve SQL throughput in retrieving reports

The Most Popular Items feature within SharePoint 2013’s Ribbon driven by the Analytics Processing Component of SharePoint 2013’s new search architecture.

The Analytics Process Component within SharePoint 2013’s search capabilities also provide for:

■ View counts

■ Sort by popularity

■ Recommendations

■ Relevancy based on usage

■ Search reports

■ Suggested sites for you to follow and the reminder of sites you are looking at for which you have previously viewed 

Search Admin Component of SharePoint 2013’s Search

SharePoint 2013 Search’s Admin component is responsible for all search provisioning as well as any topology changes. This component manages the lifecycle and monitor state for the:

■ Crawl Component 

■ Content Process Component 

■ Analytics Processing Component 

■ Index Component 

■ Query Processing Component

Within SharePoint 2013’s architecture you are able to deploy multiple Search Admin Components for high availability and fault tolerance which includes the Search Admin database which stores search configuration data such as:

■ Query rules

■ Topology

■ Managed property mappings 

■ Content sources

■ Crawl rules

■ Crawl schedules

■ Underlying analytics settings and configurations

■ Ranking model configuration

Note: Ranking model configuration is done is PowerShell cmdlets, the SharePoint search service administrator can perform the following operations on SharePoint 2013 rank models:

   ■ List ranking models

   ■ Specify a default ranking model

   ■ Change an existing custom ranking model

   ■ Delete an existing custom ranking model

   ■ Create a new ranking model

   ■ Import and export a ranking model to XML

It’s also important to note that Microsoft has provided connectors for SharePoint Server 2013’s Search to allow for seamless interaction with other major technologies such as Microsoft Exchange, Lotus Notes and Documentum.

SharePoint 2013 allows you to export and import customized search configuration settings between site collections and sites.

The settings that you export and import also include:

■ All customized query rules

■ Result sources

■ Result types

■ Ranking models

■ Ranking model describes which criteria are included in sorting as well as how much they contribute to the rank score and how they relate to one another

■ Custom ranking models are managed through Windows PowerShell as well as via the UI

■ Ranking model for a specific query can be selected at query time by setting the RankingModelId of the query

■ Site search settings

■ Exportation of customized search configuration settings from a Search service application and importation of those settings to site collections and sites

It is not possible to import customized search configuration settings into a Search service application or export the related default search configuration settings.

Search Index Component of SharePoint 2013’s Search

SharePoint 2013 Search’s Index Component provides for the overall feed and query which consists of receiving processed items from the content processing component and the persists those items to index the appropriate files.

This also entails receiving queries from the query processing component and then providing result sets in return.

The Index Component also provides for features and underlying capabilities such as:

■ Provides replication of index content between replicas (index components) within the same index partition as this index partition is a logical portion of the entire search index.

■ Each and every partition is served by one or more index components and the primary replica is set by default to maintain a persisted journal of new and updated items which is then copied to the other replicas within the partition

■ Every replica exists for added fault tolerance as well as increased query throughput and the underlying index can scale in multiple manners

■ Provides flexibility required during topology changes to apply index partition changes when a topology change occurs

Search Query Component of SharePoint 2013’s Search

SharePoint 2013Search’s Query Component performs the actual linguistic processing at the time of a query which includes word breaking, stemming, query spellchecking and the native thesaurus capabilities.

The Query Component receives the queries and then analyzes and processes them in order to optimize precision and relevancy. One the query is processed it is submitted to the index component while also providing guidance as to which query rules should apply and are applicable.

The Query Component also provide guidance around which index the query should be sent to as well as if there are any pre or post processing procedures that should be conducted on the query.

Once this is done the index returns a result set back to the query processing component which then processes it and returns it back to the appropriate point in the process.

SharePoint 2013 utilizes a new ranking models to calculate the relevance rank of search results. This ranking model also can influence the order of search results by using SharePoint Search’s query rules, the search schema and ranking models.

This enables the most relevant, searched and selected terms to be ranked via a calculated method to help ensure the most accurate search results are displayed in an order influenced by relevance as well as usage within the organization.

Search Diagnostics and Health Monitoring

SharePoint 2013 provides for a number of native query health reports to assist you in both monitoring the health and performance of SharePoint’s search but to also ensure you users are retrieving the content for which they are attempting to query.

SharePoint 2013’s native query health reports are as follows:

■ Trend

■ Overall

■ Main Flow

■ Federation

■ SharePoint Search Provider

■ People Search Provider

■ Index Engine

There are several reports that we have found to be very useful for our clients at EPC Group that we would recommend you deploy such as the:

■ Usage Reports

■ Number of Queries

■ Search Reports

■ Top Queries for (e.g. Day and Month)

■ Abandoned Queries (e.g. Day and Month)

■ No Result Queries (e.g. Day and Month)

■ Query Rule Usage (e.g. Day and Month)

 

SharePoint 2013, Office 365, and SharePoint Online Search "From the Consulting Trenches"

I will continue to build upon this post in to more "from the consulting trenches" strategies and best practices on SharePoint 2013, Office 365 and SharePoint Online's Search capabilities. I have been absent from the AIIM community for the past year as I recently finished my new book "SharePoint 2013 Field Guide: Advice from the Consulting Trenches" but am thrilled to be contributing again and will be posting on a very frequent basis in the weeks and months to come!



#Office365andSharePointOnline"FromtheTrenches" #Deep-DiveofSearchinSharePoint2013 #Search #SharePoint #InformationGovernance
0 comments
4467 views

Permalink