Searching

Revision as of 16:20, 20 November 2023 by WikiSysop (talk | contribs) (Update external links (TODO: appropriate replacements for https://refbase.ipoe.uni-kiel.de links))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This page explains how to use the refbase search facilities:

Search options

Basic search

Search pages

Refine your search results

Directly jump to particular records

User-specific search options

Search syntax

Basic queries

When searching, refbase performs a "contains" search by default. This means that refbase will return all records where the searched field contains the specified search string. As an example, searching the title field for:

arctic

will return records where the title field contains "Arctic", "Antarctic", "Antarctica", "Antarcticque" "subarctic", etc.

This "contains" search is the standard search behaviour in simple search forms such as the Quick Search or Search within Results forms. It is also used in all other search forms if "contains" or "does not contain" is selected in the drop-down that specifies the search mode.

If your search string consists of several words, refbase will return all records where the searched field literally contains the given string of words. E.g., if you searched in the title field for:

sea ice thickness

then refbase will return any records whose title exactly contains the string "sea ice thickness". This means that you don't need to enclose a string with quotation marks to force an exact match (as is the case for various online search engines such as Google). When searching a refbase database, quotation marks are treated as regular characters and have no special meaning.

Now, what if you wanted to search a database field for the occurrence of two words which are not necessarily next to each other? In refbase, the easiest way of searching for something like "contains xxx AND contains yyy" is to simply start your search with the first search term (xxx), then use the Search within Results form above the search results list to search for the second search term (yyy). Using this method, you can quickly perform complex searches on multiple fields (and using multiple search terms) without the need of figuring out the correct search pattern in advance.

When searching for two (or more) authors of a particular paper, you often know the order in which the two authors occur. In this case you can make use of the .+ metacharacter sequence which matches any string of characters (more on metacharacters below). As an example, you can use:

Cota.+Smith

to find all records where the author field contains "Cota" followed by "Smith".

Using metacharacters to form complex queries

By default, refbase allows you to use metacharacters to describe a more complex search pattern. The deployed search syntax is called regular expressions (often abbreviated as regexp or regex) which comes in many flavors. refbase supports MySQL regular expressions via use of the MySQL REGEXP (or RLIKE) syntax. The MySQL website offers an introduction on pattern matching and provides more information about the REGEXP syntax for regular expressions. See the examples section below for some simple usage examples.

While regular expressions provide a powerful syntax for searching they may be somewhat difficult to write and daunting if you're new to the concept of regular expressions. There are various tutorials on regular expressions on the net that can help you getting started. A nice basic introduction to regular expressions (PDF version) was written by Stephen Ramsay.

Search examples

Below are some basic examples that will show you how to use MySQL regular expressions with refbase. The given links are all working examples which you can try out.

If you want to find all records where a particular person is the first author of the publication, you can prefix the person's name with a caret sign (^). For example, searching the author field for:

^Ackley,

will find all records whose first author name is "Ackley". If you want to restrict the list of returned results further to only those records where a particular person is the publication's only author (i.e., has no co-authors), you may append the dollar sign ($) at the end of the author's full name and initials. For example, searching for:

^Ackley, SF$

will only find those records whose author field exactly (and only) contains the string "Ackley, SF".

Often you want to search for a particular author but you're faced with the problem that the author's name is written differently in different database records. This is often the case for authors whose names contain accented characters or umlauts. You can find all instances of an author's name by using the dot metacharacter (.) which matches every character but the newline character. As an example, to find records where the author field contains "Gómez" as well as "Gomez", you may use:

G.mez

Since the dot metacharacter does not only match the characters "o" and "ó" but also other characters, this search would also find authors whose name is e.g. "Gimez". To avoid this, you can specify the allowed characters explicitly by enclosing them with square brackets:

G[oó]mez

This will only find records whose author field contains either "Gomez" or "Gómez". In the above example, the square brackets denote a so-called character class, which you can also use to specify a range of characters. The following example will find all records where the year field contains years between 2002 and 2006:

200[2-6]

You can also use a negated character class by prefixing your list of characters within the square brackets with a caret sign (^) – which effectively matches every character that is not given within your character class. This may come in handy if you want to find an author whose name may contain more than one variable character. For example, searching for:

L[^ ]+nne

will find entries where the author field contains "Lonne", "Lönne", "Lønne" but also "Loenne". In this example, the negated character class [^ ] matches any single character that is not a space. The plus sign (+) is a quantifier that allows the pattern before this quantifier to match more than once, thus matching also cases such as "Loenne". However, the above search pattern will also match author names such as "Gallienne" or "Delzenne" which may be not what you want. Again, you can specify the allowed characters more explicitly to circumvent this problem:

L([oöø]|oe)nne

In this example we're using grouping parentheses ((...)) in combination with the alternation metacharacter (i.e. the pipe character: |) which allows to match either the part within the enclosing parentheses that's left from the pipe character or the one that's right to it. In our example, the left part within the parentheses (i.e. the character class [oöø]) will match author names such as "Lonne", "Lönne" and "Lønne" but not "Loenne". The right part within the parentheses (oe) causes "Loenne" to be matched as well.

The dot metacharacter (.) or it's repeated form (.+) can be also used to find all records where the searched field isn't empty. Here's an example for the author field:

.+

Speaking of quantifiers, you can use the question mark (?) which indicates that the single character (or search pattern) before the question mark is optional. This is useful when you want to search for multiple variants of a particular keyword, for example:

pha?eopigment

will find records whose keywords field contains either "phaeopigment" or "pheopigment". You can also append a question mark to a multi-character string that's enclosed by parentheses to indicate that this string is optional. For example, searching the title field for:

bio(geo)?chemistry

will find records whose title contains either "biogeochemistry" or "biochemistry".


Let us know if you've got further questions regarding the refbase search facilities.