Feeding a Query
How to retrieve news and save them into a query object
Cláudio Alexandre da Costa Dias
Recently, we have noticed the large number of sites sharing news about all kinds of subjects. These news are often available in HTML format, and, sometimes, in XML (RDF/RSS) format ? to be read using specific readers.
When news are in RDF/RSS format ? W3C standards, it?s pretty simple to implement a ColdFusion code to handle this XML, creating a query object for further display.
However, if news are only available in HTML format, some extra effort is needed to take these data out and to arrange them into a query object.
In order to fully understand this tutorial, you are required to know RDF/RSS standard, to fairly use Regular Expressions and to be familiar to ColdFusion MX XML functions.
First of all, let?s take a look at the simplest case: in other words, retrieve news from an RDF/RSS document.
We will take Ben Forta?s blog page in our examples ? http://www.forta.com/blog

Figure 1 : Ben Forta?s Blog
As we can see, there are a lot of news on screen?s right side. These news are also available in RDF/RSS format, through http://www.forta.com/blog/rss.cfm?mode=full.
This link gives us:

Figure 2 : RSS feed from Blog?s page
In other words, an XML document like any other. A complete description of RDF/RSS standard can be found at http://www.w3.org/RDF.
Let?s build the query object from this XML document. CFML coding of all used templates can be found at the bottom of this tutorial.
To write a CFML code doing this job, we?ll track the following steps:
We use <cfhttp> tag to retrieve the XML news document:
<cfhttp url="http://www.forta.com/blog/rss.cfm?mode=full" method="GET">
This request results are stored in cfhttp.fileContent variable. It means this variable contains the news within an XML string. An output of this variable would be like Figure 2.
In ColdFusion MX, there?s a new type of data: the XML object. Using XMLparse() function, we can convert an XML string into an XML object.
<cftry>
<cfset xDoc = XMLparse(cfhttp.fileContent)>
<cfcatch>
Invalid RDF/RSS !
<cfabort>
</cfcatch>
</cftry>
We use try...catch methodology to prevent from mal-formed XML. Dumping xDoc variable:

Figure 3 : xDoc XML object view
Once we have our XML object, xDoc, let?s identify to which RSS standard it belongs. To do this, we use the XML root element name ? xDoc.XmlRoot.XmlName.
In order to retrieve items from XML object ? the news themselves ? we use XMLsearch() function. It uses an XPath language expression to search an XML document and returns an array of XML object nodes that match the search criteria.
<cfswitch expression="#xDoc.XmlRoot.XmlName#">
<cfcase value="rdf:RDF"><!--- Version 1.x --->
<cfset arrItems = XMLSearch(xDoc, '/rdf:RDF/:item')>
</cfcase>
<cfcase value="rss"><!--- Version 0.9x --->
<cfset arrItems = XMLSearch(xDoc, '/rss/channel/item')>
</cfcase>
</cfswitch>
Each array element contains an XML object node <item></item>, which contains the elements title, description and link. We can see arrItems array in the next figure:

Figure 4 : arrItems array view
Now, we have news inside arrItems array elements. First, we create the query object, q_rss:
<cfset q_rss = queryNew("title, link, description")>
Looping over array elements, we get, for each item, text inside elements title, description and link.
<cfset n = arrayLen(arrItems)>
<!--- Loop over found items, populating query object --->
<cfloop index="i" from="1" to="#n#">
<cfset queryAddRow(q_rss)>
<cfset querySetCell(q_rss, "title", arrItems[i].title.xmlText,i)>
<cfset querySetCell(q_rss, "link", arrItems[i].link.xmlText,i)>
<cfset querySetCell(q_rss, "description", arrItems[i].description.xmlText,i)>
</cfloop>
Then, dumping q_rss:

Figure 5 : q_rss query ? final display
As we have seen, it?s fairly simple to create a query object from an RDF/RSS document. However, what if the RDF/RSS news document is not available? In other words, news are only available in HTML format.
The steps we?ll follow are essentially the same. But, as we don?t have the XML object, we won?t be able to use XMLsearch() function to retrieve items. Then, we have to search for items with another tool. How about Regular Expressions? They are quite helpful when searching patterns.
Let?s start working:
We use <cfhttp> tag to retrieve the HTML news page:
<cfhttp url="http://www.forta.com/blog" method="GET">
This request results are stored in cfhttp.fileContent variable. It means this variable contains the news within an HTML string. This string is, then, stored in sDoc variable.
<cfset sDoc = cfhttp.fileContent>
The hardest part of our job is to build a regular expression that matches Ben Forta?s HTML news text. We highly recommend you to use a regular expressions tester tool, which tests them as long as they are created.
At the bottom of this tutorial, an HTML application ? REtest.htm ? is given. It will help you when creating regular expressions.
Using it, we get to the following regular expression:
<cfset regExp = '<font color="336633"><b>([\s\S]*?)</b></font>[\s\S]*?
<font size="-1">([\s\S]*?)</font>[\s\S]*?
<a href="(index\.cfm\?mode=e&entry=[0-9]*?)">'>
Sub expressions ? terms inside parenthesis ? represent title, description and link to each item. Note that there are links to sub expressions as well as to next occurrences of search pattern.

Figure 6 : REtest.htm
First, we create the query object, q_rss:
<cfset q_rss = queryNew("title, link, description")>
We use, now, REfindNoCase() function to search sDoc text for the regular expression specified before. Note that the function call is nested in a loop, which tests the function return, through start variable.
As seen before, sub expressions title, description and link can be found in this order. Therefore, they match to positions 2, 3 and 4 pos and len arrays. These arrays are keys of the stResult structure, returned by REfindNoCase() function.
<cfset start = 1>
<cfloop condition="#start#">
<cfset stResult = REfindNoCase(regExp,sDoc,start,"Yes")>
<cfif stResult.pos[1]>
<cfset queryAddRow(q_rss)>
<cfset querySetCell(q_rss,"title",mid(sDoc,stResult.pos[2],stResult.len[2]))>
<cfset querySetCell(q_rss,"link",mid(sDoc,stResult.pos[4],stResult.len[4]))>
<cfset querySetCell(q_rss,"description",
mid(sDoc,stResult.pos[3],stResult.len[3]))>
</cfif>
<cfset start = stResult.pos[1] + stResult.len[1]>
</cfloop>
Checking results:

Figure 7 : q_rss query final display
As it was seen before.
<!--- Retrieve RSS data from Ben Forta's blog--->
<cfhttp url=&qu
All ColdFusion Tutorials By Author: Claudio Dias
-
Build a List, Get a Tree
How to create an outlined tree from <ul> and <li> tags, CSS and JavaScript
Author: Claudio Dias
Views: 12,093
Posted Date: Tuesday, November 9, 2004
-
Building a Suggest List with XMLHttpRequest
Avoid huge dropdowns! This tutorial shows how to dinamically create a suggest list as long as the user fills a form field. Like Google Suggest!
Author: Claudio Dias
Views: 20,466
Posted Date: Friday, March 11, 2005
-
Building an Editable Grid with AJAX and ColdFusion Components
This tutorial shows how to change an HTML table into an editable grid using JavaScript, XML and ColdFusion Components. No more page reloads! Its easy! Its simple!
Author: Claudio Dias
Views: 18,934
Posted Date: Friday, September 16, 2005
-
Feeding a Query
A simple approach on how to create a query object from RDF/RSS feed. Further, when news are in HTML format, how to get them using Regular Expressions.
Author: Claudio Dias
Views: 9,816
Posted Date: Tuesday, May 25, 2004
-
Nested Custom Tags
This tutorial explains how to use nested custom tags in ColdFusion. It also presents one 'basic quiz' in CF MX where nested custom tags show an interesting behavior
Author: Claudio Dias
Views: 11,334
Posted Date: Monday, June 23, 2003