Building

Inserting Meta Tags and the Title Tag

The goal of this document is to help you set up your Cal Poly online documents using Meta Tags so that users who may be searching for your website using search engines can find them more easily. By tuning the way you code, name, and organize documents, you can enhance the quality of search results and thereby improve user satisfaction. The following guidelines will enhance your presentation with most search engines such as Cal Poly's PolySearch, AltaVista, Google, and HotBot to name a few.

Meta tags are embedded in the the <HEAD> section of a web page and don't affect how the page is displayed in a browser. Cal Poly uses Meta tags to identify the departmental ownership of a document, to provide a brief description of the document, and provide keywords that represent the page's content and that can be searched on when using a search engine.

Page title (this is not a meta tag but is equally important)

It is possible to specify a title in the <head> section of every Web page. Titles are one of the most important things to optimize. Among other benefits, this simple addition assists users with finding your pages through a World Wide Web search engine. With respect to search engine results, good titles make search results much easier to understand because it is the title that appears as the link name for each found page. Words in a title are also very influential when determining a page's rank among other found pages.

Syntax: <title>This is the Title</title>

Guidelines:

  • Specify a title in the <head> section of every Web page using the <title> tag.
  • Create meaningful titles that are terse but informative, can be same title as used on the page.
  • Avoid using the same title more than once
  • If your Web site uses frames, the title of the main content document (rather than the frameset document) will be used. Page authors often neglect the titles of these documents, so they tend to be poorly chosen or absent.

META-tags

A META-tag is a hidden piece of information that can be placed in an HTML document. The goal of the PolySearch search engine is to operate with the four types of tags described below. Note, however, that the keyword and description tags, though not strict standards, are typically recognized by other Internet search engines by the names "KEYWORDS" and "DESCRIPTION", respectively.

For more extensive information about META-tags please see http://www.htmlhelp.org/reference/html40/head/meta.html

Keywords

Keywords are used by most search engines as special "hit words." Pages that have searched-for words in their keyword META-tags will be given a high ranking on result pages. Note that it is also possible to include words in the keyword META-tag that are not found on the page itself. Most search engines will only include the first 1000 characters in a keyword list.

Syntax: <meta name="keywords" content="keyword1, keyword2, keyword3, keyword4, keyword5">

Example: A page containing information about wildlife mammals in Africa, might look like this:

<meta name="keywords" content="Africa, wildlife, mammal, mammals, lion, elephant, giraffe, gazelle, safari">

Description

Description META-tags are used to give a short description of a page. Many search engines will display this description below the title/link of the page. If there is no description META-tag, the search engine will typically try to create an abstract of the page and use that as a description. To avoid being truncated by search engines, the description should be brief-no more than 128 characters.

Syntax: <meta name="description" content="Put your page description here.">

Example: A page containing information about wildlife mammals in Africa, might look like this:

<meta name="description" content="Information about the variety of wildlife mammals in Africa. Lookup information about African mammals from mice to elephants.">

Department

The department META-tag is used to identify the relative department ownership / reference for the page. The CONTENT value of the department META-tag can be the name of the department/service/program/club that owns the page, a combination of the division and department names, or any other relevant program name.

Syntax: <meta name="Department" content="Department Name">

Example: A page that is "owned" by the Parent Program is identified using the "department META-tag" and would look similar to this:

<meta name="department" content="Parent Program">

Robots

Use the robots META-tag to decide whether a document should be indexed by a search engine and whether its links should be followed. There are several options available for excluding documents from a search engine index.

Excluding Content From Search Engines

There are several reasons why you might decide to prevent robots from indexing some or all of your pages. For example, if your site is an online database, the indexing crawler may attempt to index all possible derivations of dynamic content from the database. Another is sensitive department information that is intended for use in the department only. Another reason is to prevent clutter on the search engine as some pages on a website may be meaningless if arrive at first from a search result selection.

Several options are available for controlling which parts of your site a robot will index. They are described in the sections below.

META Tag Robot Control

Use the robots META-tag to decide whether a document should be indexed and whether its links should be followed. This is especially useful for pages that contain only a list of links: the meta tag

<meta name="robots" content=" noindex, follow">

will cause an indexing crawler will follow all of the links but will not index the link page itself. See the robots meta tag exclusion standard for more information.

Note: some search engines don't support the robots META-tag (Cal Poly's Google search appliance does support it). It is wise to use the robots.txt file to address these search engines and to make more general rules for exclusion.

Syntax: The NAME value is always "robots", and the CONTENT value is a comma-separated combination of one or more of the following values:

  • index: specifies that the page should be indexed
  • noindex: specifies that the page should not be indexed
  • follow: specifies that the page's links should be followed
  • nofollow: specifies that the page's links should not be followed
  • all: specifies that the page should be indexed and all links should be followed
  • none: specifies that the page should not be indexed and the links should not be followed.
  • noarchive: specific to Google (and Cal Poly's Google appliance) this value causes Google to block cached content, even though it is not mentioned in the robots standard.

Example: The following META element tells search engines and other robots not to index the page but to follow links on it:

<meta name="robots" content="noindex, follow">

The "robots.txt" File

All major search engines follow the "robots.txt" standard for controlling how a site is indexed. The "robots.txt" file must be located in the root directory of your Web site. Examples of how to construct the content of this file are given below. For complete information about configuring a "robots.txt" file, see robotstxt.org

Syntax:

User-agent: *
Disallow: /path/
Allow: /path/ (note, not all indexers support the "Allow" parameter)

Examples:

Disallow all robots access to the site
User-agent: *
Disallow: /

Allow all robots full access to the site
User-agent: *
Disallow:

Exclude all robots from specific folders and files on a site
User-agent: *
Disallow: /cgi-bin/
Disallow: /secret/
Disallow: /news/newsletter.html

Exclude a specific robot from specific folders and files on a site
User-agent: WebCrawler
Disallow: /cgi-bin/
Disallow: /products/special_offers/index.html

Combinations
User-agent: vspider
Disallow:
User-agent: *
Disallow: /cgi-bin/

For more information concerning robot.txt file for Cal Poly search engine please see the CP Search FAQ page.