<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<!--
$RCSfile$
$Author: hansonr $
$Date: 2010-05-21 17:08:32 +0200 (ven., 21 mai 2010) $
$Revision: 13170 $
Copyright (C) 2005 The Jmol Development Team
Contact: jmol-developers@lists.sf.net
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA.
-->
</head>
<body bgcolor="white">
<p>
<h1>Jmol 3D-SEARCH</h1>
<p>Robert M. Hanson<br/>Department of Chemistry<br/>St. Olaf College<br/>5/19/2010</p>
<p>
An adaptation of SMILES and SMARTS for 3D molecular atom search and selection.
The <a href=".">org.jmol.smiles</a> package provides extensive functionality for selecting
atoms within a three-dimensional model based on SMILES and SMARTS strings.
This package may be used independently
of Jmol -- see <a href="../../../JmolSmilesApplet.java">JmolSmilesApplet.java</a>
and <a href="http://chemapps.stolaf.edu/jmol/docs/examples-11/JmeToJmol.htm">JmeToJmol.htm</a>.
</p>
<p> Besides a presentation of general considerations, a detailed <a href="#specs">specification</a> for syntax, and
the term "aromatic" is <a href="#aromaticity">defined</a>.
</p>
<h3>General Considerations</h3>
<p>
<b>format</b>
<ul><li>
Allows any amount of white space -- spaces, tabs, new lines. Prior to parsing, all white space is removed.
</li></ul>
<b>atom selection</b>
<ul>
<li>For SMILES searches, all hydrogen atoms -- as in HCCC or [CH2] -- are selected. This includes all hydrogens needed to complete the
"normal" valence of an unbracketed atom such that "CCC" is the same as "[CH3][CH2][CH2]".
</li><li>For SMARTS searches, no valence calculation is done to add any additional hydrogens to unbracketed
atoms. "CCC" is the same as "[C][C][C]". only unbracketed or bracketed hydrogen atoms such as H[C]C or [H]
or [2H] are selected; connected hydrogen atoms as in [CH3] are not selected.
</li><li>Adds { } for selection of one or more subsets of matched atoms.
Simply place { } around any atom or group of atoms that are to be
specifically returned from a search.
</li></ul>
<b>aromaticity</b>
<ul>
<li>Jmol 3D-SEARCH defines "aromatic" unambiguously and strictly geometrically.
see <a href="#aromaticity">below</a>.
</li><li>Note that "aromatic" is not restricted to any specific subset of elements.
</li></ul>
</p>
<h3>Comparision to <a target="_blank" href="http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html">Daylight SMILES</a></h3>
<p>
All single-component aspects of Daylight SMILES are implemented, including
aromaticity and atom- and bond-based stereochemistry ("chirality").
</p>
<h3>Comparision to <a target="_blank" href="http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html">Daylight SMARTS</a></h3>
<p>
<ul>
<li>[H1] interpreted as [*H1] -- "an atom with one connected H atom".
</li><li>Allows definition of [$XXX] variables:
<pre>
Var x = '$R1="[CH3,NH2]";$R2="[OH]"; {a}[$R1]' // select aromatic atoms attached to CH3 or NH2
select within(SMARTS,@x)
</pre>
Note that these variables are any string whatsoever, not just atom sets. The syntax is simply:
<ul><li>Each variable definition takes the form $ [name] =" [definition] " [comments] ;
</li><li>[name] can be any characters except '$', '=', and ']' and must not start with '('.
It is recommended they be restricted to the set A-Z, a-z, and 0-9.
</li><li>[definition] can be any valid SMARTS characters.
</li><li>[comments] can be any characters other than ';'.
</li><li>The actual pattern starts after the last variable definition.
</li><li>Nested variables are allowed, but note that this may require using the recursion syntax, $(...):
<pre>
Var x = '$R1="[CH3,NH2]";$R2="[$($R1),OH]"; {a}[$R1]' // select aromatic atoms attached to CH3, NH2, or OH
select within(SMARTS,@x)
</pre>
</li><li>For $xxx="yyyy", all occurrances of the string "[$xxx]" are replaced within the pattern prior to parsing.
</li></ul>
<br/>
</li><li>Implements nested ("recursive") SMARTS:
<pre>
Var x = '$R1="[CH3,NH2]";$R2="[OH]"; {a}[$([$R1]),$([$R2])]' // aromatic attached to CH3, NH2, or OH
select within(SMARTS,@x)
</pre>
Note that $(...) need not be within [...], and
wherever it is, it always means "just the first atom".
</ul>
<b>primitives</b>
<ul>
<li>All Daylight SMARTS primitives are implemented. These include:
<br/>
<table border="1" cellpadding="5" width="500">
<tr><td>[Element]</td><td>capitalized - standard notation Na, Si, etc. -- specific non-aromatic atom</td></tr>
<tr><td>[element]</td><td>uncapitalized - specific aromatic atom (as for standard notation, no limitations)</td></tr>
<tr><td>*</td><td>any atom</td></tr>
<tr><td>A</td><td>any non-aromatic atom</td></tr>
<tr><td>a</td><td>any aromatic atom</td></tr>
<tr><td>#</td><td>atomic number</td></tr>
<tr><td>(integer)</td><td>mass number -- Note, however, that [H1] is [*H1], "any atom with one attached hydrogen", not unlabeled hydrogen, [1H].</td></tr>
<tr><td>D</td><td>degree - total number of connections</td></tr>
<tr><td>H</td><td>exact hydrogen count</td></tr>
<tr><td>h</td><td>"implicit" hydrogen count (atoms are not in structure)</td></tr>
<tr><td>R</td><td>in the specified number of rings</td></tr>
<tr><td>r</td><td>in ring of a given size</td></tr>
<tr><td>v</td><td>valence (total bond order)</td></tr>
<tr><td>X</td><td>calculated connectivity, including implicit hydrogens</td></tr>
<tr><td>x</td><td>number of ring bonds</td></tr>
<tr><td>@</td><td>stereochemistry</td></tr>
</table>
<br/>
In addition, Jmol 3D-SEARCH adds the following primitives and options:
<br/>
<br/>
<table border="1" cellpadding="5" width="500">
<tr><td>d</td><td>non-hydrogen degree -- number of non-hydrogen connections</td></tr>
<tr><td>[number]?</td><td>mass number or undefined (so, for example, [C12?] means any carbon that isn't explicitly C13 or C14</td></tr>
</table>
<br/>
</li><li>All primitives that are not element names, <b>*</b>, <b>A</b>, or <b>a</b> must be
enclosed in brackets. In addition, the following elements must be enclosed in brackets
because their two-letter combination Xy implies the non-aromatic element X attached
to the aromatic element y: Ac, Ba, Ca, Na, Pa, and Sc.
</li><li>Allows any order of bracketed primitives: [H2C13] same as [13CH2].
</li><li>All atom logic implemented: [X,!X,X&X,X&X&X;X&X] etc.
</li><li>"&" is optional: [13CH2] same as [13&C&H2]
except in cases of ambiguity with element symbols: [Rh] is rhodium, not [R&h].
</li><li>Jmol 3D-SEARCH does NOT implement:
<ul><li> "zero-level parentheses", since the match is
always only within a given model (but note that you can still use '.' for a "not-connected" indicator)
</li><li>"?" in atom stereochemistry ("chirality") because
3D structures are always defined stereochemically.
</li><li>"?" for bond stereochemistry, as 3D structures
are always defined stereochemically
</li></ul>
</li></ul>
<b>implicit hydrogen count</b>
<ul><li>
The primitives <b>h</b> (implicit hydrogen count) and <b>X</b> (total connections, including implicit hydrogens)
require analysis of bonding around a model atom to determine the number of missing ("implicit") hydrogen atoms based on a "target valence."
Models that specify only "aromatic" or "partial" bonding may produce ambiguous results, and for that reason,
primitives <b>X</b> and <b>h</b> are not recommended for use. Other primitives, such as <b>D</b>, <b>d</b>, and <b>v</b> should be more useful.
The analysis Jmol uses here is the same as for how Jmol calculates the number of hydrogens to add for the <b>calculate hydrogens</b> command and
includes:
<ol style="list-style-type: lower-alpha;"><li>Assign the target valence <b>TV</b> as follows:
<ul><li>For C and Si, <b>TV</b> = 4.
</li><li>For B, N, and P, <b>TV</b> = 3.
</li><li>For O and S, <b>TV</b> = 2.
</li><li>For F, Cl, Br, and I, <b>TV</b> = 1.
</li><li>For all other atoms, <b>TV</b> = 0.
</li></ul>
</li><li>Obtain the formal charge on the atom, <b>C</b>.
</li><li>Group IV elements such as carbon are unique, in that their cations are valence-poor, not valence-rich.
So for carbon and silicon, subtract the ABSOLUTE VALUE of <b>C</b> from the target valence.
In all other cases, let <b>TV</b> = <b>TV</b> + <b>C</b>.
</li><li>Determine the overall valence of the atom, <b>OV</b>. This is calculated by adding up all the bond orders to the atom.
</li><li>Subtract <b>OV</b> from <b>TV</b> to get the number of implicit hydrogen atoms. If this number is less than zero, assign zero.
</ol>
</li><li>Thus, the implicit hydrogen count is:
<ul><li> 0 for all atoms other than {B,C,N,O,P,Si,S}
</li><li>0 for BR3
</li><li>0 for CR4, 1 for CR3, 2 for CR2, 3 for CR
</li><li>0 for CR3(+), 0 for CR3(-)
</li><li>0 for R=CR2, 1 for R=CR, 2 for R=C, 1 for C#R (triple bond)
</li><li>0 for NR3, 1 for NR2, 2 for NR
</li><li>0 for RN=R, 1 for R=N
</li><li>1 for NR3(+), 1 for R=NR(+), 1 for RN(-)
</li><li>0 for OR2, 0 for O=R, 1 for OR
</li><li>0 for RO(-), 2 for RO(+)
</li></ul>
</li></ul>
</p>
<a name="specs"><h3>Detailed Jmol 3D-SEARCH SMARTS Specification</h3></a>
<p>
<pre>
# note: prior to parsing, all white space is removed
[smartDef] == [variableDefs] [smarts] | [smarts]
[variableDefs] == [variableDef] | [variableDef] [variableDefs]
[variableDef] == "$" [label] "=" "\"" [smarts] "\"" [comments] ";"
[label] == [any characters other than "=" and "$", and not starting with "("]
[comments] == [any characters other than ";"]
# note: Variable definitions must be parsed first.
# After that, all variable references [$XXXX] are replaced
[smarts] == [node][connections]
[connections] == [connection] | NULL }
[connection] == { [branch] | [bond] [node] } [connections]
[branch] == "(" [smarts] ")"
[node] == { [atomExpression] | [ringPointer] }
[ringPointer] == { "%" [digits] | [digit] }
# note: all ringPointers must have a second matching ringPointer
# and must be preceded by an atomExpression or
[bond] == { "-" | "=" | "#" | "." | "/" | "\\" | ":" | "~" | "@" | NULL
[atomExpression] = { [unbracketedAtomType]
| "[" [bracketedExpression] "]"
| [nestedExpression] }
[unbracketedAtomType] == [atomType]
& ! { "Ac" | "Ba" | "Ca" | "Na" | "Pa" | "Sc"
| "ac" | "ba" | "ca" | "na" | "pa" | "sc" }
# note: Brackets are required for these elements: [Na], [Ca], etc.
# These elements Xy are instead interpreted as "X" "y", a single-letter
# element followed by an aromatic atom.
[atomType] == { [validElementSymbol] | "A" | [aromaticType] | "*" }
[validElementSymbol] == (see <a href="../util/Elements.java">Elements.java</a>;
including Xx and only through element 109)
[aromaticType] == { "a" | [validElementSymbol].toLowerCase() }
[bracketedExpression] == { [orSet] | [orSet] ";" [andSet] }
[orSet] == { [andSet] | [andSet] "," [andSet] }
[andSet] == { [primitives] | [primitives] "&" [andSet]
| "!" [primitive]
| "!" [primitive] "&" [andSet] }
[primitives] == { [primitive] | [primitive] [primitives] }
# note -- if & is not used, certain combinations of primitiveDescritors
# are not allowed. Specifically, combinations that together
# form the symbol for an element will be read as the element (Ar, Rh, etc.)
# when NOT followed by a digit and no element has already been defined
# So, for example, [Ar] is argon, [Ar3] is [A&r3], [ORh] is [O&R&h],
# but [Ard2] is [Ar&d2] -- "argon with two non-hydrogen connections"
# Also, "!" may not be use with implied "&".
# Thus, [!a], [!a&!h2], and [h2&!a] are all valid, but [!ah2] is invalid.
[primitive] == { [isotope] | [atomType] | [charge] | [stereochemistry]
| [A_Prop] | [D_Prop] | [H_Prop] | [h_Prop]
| [R_Prop] | [r_Prop] | [v_Prop] | [X_Prop]
| [x_Prop] | [nestedExpression] }
[isotope] == [digits] | [digits] "?"
# note -- isotope mass may come before or after element symbol,
# EXCEPT "H1" which must be parsed as "an atom with a single H"
[charge] == { "-" [digits] | "+" [digits] | [plusSet] | [minusSet] }
[plusSet] == { "+" | "+" [plusSet] }
[minusSet] == { "-" | "-" [minusSet] }
[stereochemistry] == { "@" # anticlockwise
| "@@" # clockwise
| "@" [stereochemistryDescriptor]
| "@@" [stereochemistryDescriptor] }
[stereochemistryDescriptor] == [stereoClass] [stereoOrder]
[stereoClass] == { "AL" | "TH" | "SP" | "TP" | "OH" }
[stereoOrder] == [digits]
# note -- "?" here (unspecified) is not relevant in 3D-SEARCH
[A_Prop] == "#" [digits] # elemental atomic number
[D_Prop] == { "D" [digits] | "D" } # degree -- total number of connections
# excludes implicit H atoms; default 1
[d_Prop] == { "d" [digits] | "d" } # degree -- non-hydrogen connections
# default 1
[H_Prop] == { "H" [digits] | "H" } # exact hydrogen count
# excludes implicit H atoms
[h_Prop] == { "h" [digits] | "h" } # implicit hydrogens -- "h" indicates "at least one"
# (see note below)
[R_Prop] == { "R" [digits] | "R" } # ring membership; e.g. "R2" indicates "in two rings"
# "R" indicates "in a ring"
# !R" or "R0" indicates "not in any ring"
[r_Prop] == { "r" [digits] | "r" } # in ring of size [digits]; "r" indicates "in a ring"
[v_Prop] == { "v" [digits] | "v" } # valence -- total bond order (counting double as 2, e.g.)
[X_Prop] == { "X" [digits] | "X" } # connectivity -- total number of connections
# includes implicit H atoms
[x_Prop] == { "x" [digits] | "x" } # ring connectivity -- total ring connections
[nestedExpression] == "$(" + [atomExpression] + ")"
[digits] = { [digit] | [digit] [digits] }
[digit] = { "0" | "1" | "2" | "3" | "4" | "5" | "6" | 7" | "8" | "9" }
</pre>
</p>
<a name="aromaticity"><h3>Jmol 3D-SEARCH Definition of "aromatic"</h3></a>
<p>
We define "aromatic" here strictly in terms of geometry - a flat ring with trigonal planar geometry for all atoms in the ring.
No consideration of bond order is used, because for the sorts of models that can be loaded into Jmol, many do not
assume a bonding scheme (PDB, GAUSSIAN, etc.).
</p>
<p>
Given a ring of N atoms...
<pre>
1
/ \
2 6 -- 6a
| |
5a -- 5 4
\ /
3
</pre>
with arbitrary order and up to N substituents...
<ol>
<li>
Check to see if all ring atoms have no more than 3 connections.
Note: An alternative definition might include "and no substituent
is explicitly double-bonded to its ring atom, as in quinone.
Here we opt to allow the atoms of quinone to be called "aromatic."
</li><li> Select a cutoff value close to zero. We use 0.01 here.
</li><li> Generate a set of normals as follows:
<ol style="list-style-type: lower-alpha;"><li> For each ring atom, construct the normal associated with the plane
formed by that ring atom and its two nearest ring-atom neighbors.
</li><li> For each ring atom with a substituent, construct a normal
associated with the plane formed by its connecting substituent
atom and the two nearest ring-atom neighbors.
</li><li> If this is the first normal, assign vMean to it.
</li><li> If this is not the first normal, check vNorm.dot.vMean. If this
value is less than zero, scale vNorm by -1.
</li><li> Add vNorm to vMean.
</ol>
</li><li>Calculate the standard deviation of the dot products of the
individual vNorms with the normalized vMean.
</li><li>The ring is deemed flat if this standard deviation is less
than the selected cutoff value.
</li></ol>
</p>
<p>
-- <a href="mailto:hansonr@stolaf.edu">Bob Hanson</a> 5/19/2010
</p>
</body>
</html>
|