Chapter 10. XPath for XML and JSON
Index:
XPath stands for XML Path Language. It uses a non-XML syntax to provide a flexible way of addressing (pointing to) different parts of an XML (JSON) document. It can also be used to test addressed nodes within a document to determine whether they match a pattern or not.
XPath uses a path notation (as in URLs) for navigating through the hierarchical structure of an XML (JSON) document. It uses a non-XML syntax so that it can be used in URIs and XML attribute values.
Through XPath + SQL, it's easy to query/update XML (JSON) document for database administor or JDBC developer.
We will use the following XML document in the examples below.
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book category="COOKING"> <title lang="en">Everyday Italian</title> <author>Giada De Laurentiis</author> <year>2005</year> <price>30.00</price> </book> <book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category="WEB"> <title lang="en">XQuery Kick Start</title> <author>James McGovern</author> <author>Per Bothner</author> <author>Kurt Cagle</author> <author>James Linn</author> <author>Vaidyanathan Nagarajan</author> <year>2003</year> <price>49.99</price> </book> <book category="WEB"> <title lang="en">Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore>
XPath uses path expressions to select nodes in an XML (JSON) document. The node is selected by following a path or steps. The most useful path expressions are listed below:
Expression | Description |
---|---|
nodename | Selects all nodes with the name "nodename". XML element has a node name always. But JSON hasn't in fact, __self__ become a special node name for JSON. You can omit __self__ if there's a predicate. For instance, "/__self__", "/[price=3], and "__self__/__self__". |
/ | Selects from the root node. A location path can be absolute or relative. An absolute location path starts with a slash ( / ) and a relative location path does not. In both cases the location path consists of one or more steps, each separated by a slash. |
'stringValue' | For instance, select _value from books."/bookstore/book[price>30]/title[_value='XQuery Kick Start']"; |
@ | Selects attributes |
In the table below we have listed some path expressions and the result of the expressions:
Path Expression | Result |
---|---|
bookstore | Selects all nodes with the name "bookstore" |
/bookstore | Selects the root element bookstore Note: If the path starts with a slash ( / ) it always represents an absolute path to an element! |
bookstore/book | Selects all book elements that are children of bookstore |
/@lang | Selects the attribute that are named lang |
Predicates are used to find a specific node or a node that contains a specific value.
Predicates are always embedded in square brackets.
In the table below we have listed some path expressions with predicates and the result of the expressions:
Path Expression | Result |
---|---|
/bookstore/book[1] | Selects the first book element that is the child of the bookstore element.
Note: In Java, first node is[0], but according to W3C, it is [1]. For instance, select * from books."/bookstore/book[1]/title/@lang"; |
/bookstore/book[last()] | Selects the last book element that is the child of the bookstore element |
/bookstore/book[last()-1] | Selects the last minu one book element that is the child of the bookstore element. For instance, select * from books."/bookstore/book[last()]"; |
/bookstore/book[position()<3] | Selects the first two book elements that are children of the bookstore element. For instance, select * from books."/bookstore/book[position()<3]"; |
/bookstore/book[price>35.00] | Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00 |
/bookstore/book[price>35.00]/title | Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00 |
XPath wildcards can be used to select unknown XML nodes.
Wildcard | Description |
---|---|
* | Matches any element node. For instance, select * from books."/bookstore/*"; |
@* | Matches any attribute node. It returns attribute, not node. For select * from books."/bookstore/book[1]/title/@*"; |
node() | Matches any node of any kind. For instance, select * from books."/bookstore/book[1]/node()"; |
text() | Matches only text content of any node . For instance, select * from books."/bookstore/book[1]/text()"; |
In the table below we have listed some path expressions and the result of the expressions:
Path Expression | Result |
---|---|
/bookstore/* | Selects all the child element nodes of the bookstore element |
An XPath expression returns either a node-set, a string, a Boolean, or a number.
Below is a list of the operators that can be used in XPath expressions:
Description | Example | |
---|---|---|
+ | Addition | 6 + 4 |
- | Subtraction | 6 - 4 |
* | Multiplication | 6 * 4 |
div | Division | 8 div 4 |
= | Equal | price=9.80 |
!= | Not equal | price!=9.80 |
< | Less than | price<9.80 |
<= | Less than or equal to | price<=9.80 |
> | Greater than | price>9.80 |
>= | Greater than or equal to | price>=9.80 |
or | or | price=9.80 or price=9.70 |
and | and | price>9.00 and price<9.90 |
mod | Modulus (division remainder) | 5 mod 2 |
select * from books."/bookstore/book/title";
or select type,"@lang",value from books."/bookstore/book/title";
<title lang="en">Everyday Italian</title> <title lang="en">Harry Potter</title> <title lang="en">XQuery Kick Start</title> <title lang="en">Learning XML</title>as the followsing ResultSet:
type VARCHAR 5 0 value VARCHAR 16 0 title en Everyday Italian title en Harry Potter title en XQuery Kick Start title en Learning XML
You should find those sub elements and attributes has been expanded for data opertion.
If you wish to get only element, you can use select __self__ from books."/bookstore/book/title";
or
select a.__self__ from books."/bookstore/book/title" as a;
to get the followsing ResultSet:
Self XML 0 0 <title lang="en">XQuery Kick Start</title> <title lang="en">Learning XML</title>
Self is a Pseudo Columns, you can find __self__, _children,_childN, _type, _value, and so on at Pseudo Columns.
Predicates can be used to filter XML Data.
select * from books."/bookstore/book[price<30]";
or select * from books."/bookstore/book" where price<30;
<book category="CHILDREN"> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book>as the followsing ResultSet:
type VARCHAR 4 0 title XML 0 0 author XML 0 0 year XML 0 0 price XML 0 0 book CHILDREN <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price>
If you want to get directly the sub element value, you can use xmlObject->'subElementName' format. For instance, select "@category",title,title->'__self__', title->'_value' as titleB,price, price->'_value' as priceB from books."/bookstore/book[price<30]" as book; That annoies SQL user, so that you can use "focusValue=true"(default value) as a connection property, then select * from books."/bookstore/book" where price<30;
will fetch the followsing ResultSet:type VARCHAR 4 0 title VARCHAR 16 0 author VARCHAR 19 0 year INTEGER 10 0 price NUMERIC 5 10 book CHILDREN Harry Potter J K. Rowling 2005 29.99
select * from books."/bookstore/book[price>30]/title"
will fetch the following XML data
<title lang="en">XQuery Kick Start</title> <title lang="en">Learning XML</title>as the followsing ResultSet:
type VARCHAR 5 0 value VARCHAR 17 0 title en XQuery Kick Start title en Learning XML
select title from books."/bookstore/book" where price>30 order by title
will fetch the following XML data
<title lang="en">Learning XML</title> <title lang="en">XQuery Kick Start</title>as the followsing ResultSet:
title XML 0 0 <title lang="en">Learning XML</title> <title lang="en">XQuery Kick Start</title>
Misc sql samples: select * from SZJP01180; select "file" from SZJP01180; select "file/@index" from SZJP01180; select "@label","@size","file","file/@index" from SZJP01180; select * from SZJP01180."root"; select * from SZJP01180x."root" where instr("@label",'a')>0; select * from SZJP01180x."root/gallery"; select "index",* from SZJP01180."root/gallery"; select "@src" from SZJP01180."root/gallery" ; select * from SZJP01180."root/gallery" where "@src"!="@src1"; select "@src" as label,"@index" as index from SZJP01180."root/gallery" order by VAL("@index"); select a."@label",a."@index",b."@src" from SZJP01180x."root/gallery" as a ,SZJP01180."root/gallery" as b where a."@index"=b."@index" order by a."@index"; select * from SZJP01180x."root/gallery/file"; select * from SZJP01180x."root/gallery/file" where "@index"=2; select * from SZJP01180x."root/gallery/file[@index=365]"; select * from SZJP01180x."root/gallery/file[@index=1]/file[@index=2]"; select * from SZJP01180x."root/gallery/file[@index=365]/file[@index=410]"; select file."@oindex", file."@index",file."@src",index."@label" from SZJP01180."root/gallery" as file , SZJP01180x."root/gallery" as index where file."@oindex"=index."@oindex" order by file."@oindex", file."@index"; select file."@oindex", file."@index",file."@src",index."@label" from SZJP01180."root/gallery" as file , SZJP01180x."root/gallery/file" as index where file."@oindex"=index."@oindex" order by file."@oindex", file."@index"; select file."@oindex", file."@index",file."@src",index."@label" from SZJP01180."root/gallery" as file , SZJP01180x."root/gallery/file/file" as index where file."@oindex"=index."@oindex" order by file."@oindex", file."@index"; select a."@label",a."@index",b."@src" from SZJP01180x."root/gallery/file[@index=1]" as a ,SZJP01180."root/gallery" as b where a."@index"=b."@index" order by VAL(a."@index"); select a."@label" as label,a."@index" as index,b."@src" as fileName from SZJP01180x."root/gallery/file[@index=1]" as a ,SZJP01180."root/gallery" as b where a."@index"=b."@index" order by VAL(a."@index") drop database if exists testxml; create database if not exists testxml; CREATE table testxml."tableabc/bbb" (aaa varchar(12),bb timestamp,cc char(1),dd char(32)); CREATE table testxml.tableabc (CLIENT varchar(12),SHDATE timestamp,CONFIRM char(1),CAT char(32),NAMEASGN char(40),STDIAN char(12),PROVCODE char(8),MUNI char32)); insert into testxml."tableabc/aaa" (aaa,bb,cc) values('aaa','zzz','dddd'); insert into testxml.table1 (CLIENT,SHDATE) values('dddd',now()); update testxml.table1 set CAT='aaa' where CLIENT='fff'; delete from testxml.table1 where CLIENT='eee'; insert into testxml."table1/aaa" (aaa,bb,cc) values('aaaaa','zzdz','ddzdd'); alter table testxml.table1 rename CLIENT to bcd; alter table testxml.table1 drop NAMEASGN, rename to test22; alter table testxml.test22 add ooo int, rename to table1; alter table testxml."table1/aaa" drop cc; select * from testxml."table1/aaa"; select * from testxml.table1; drop table testxml."table1/aaa"; drop table testxml.table1; drop database testxml;