Feeding a Query

Feeding a Query

How to retrieve news and save them into a query object

 

Cláudio Alexandre da Costa Dias

 

Introduction

Recently, we have noticed the large number of sites sharing news about all kinds of subjects. These news are often available in HTML format, and, sometimes, in XML (RDF/RSS) format ? to be read using specific readers.

When news are in RDF/RSS format ? W3C standards, it?s pretty simple to implement a ColdFusion code to handle this XML, creating a query object for further display.

However, if news are only available in HTML format, some extra effort is needed to take these data out and to arrange them into a query object.

In order to fully understand this tutorial, you are required to know RDF/RSS standard, to fairly use Regular Expressions and to be familiar to ColdFusion MX XML functions.

 

Creating a Query Object from RDF/RSS data

First of all, let?s take a look at the simplest case: in other words, retrieve news from an RDF/RSS document.

We will take Ben Forta?s blog page in our examples ? http://www.forta.com/blog

 

Figure 1 : Ben Forta?s Blog

 

As we can see, there are a lot of news on screen?s right side. These news are also available in RDF/RSS format, through http://www.forta.com/blog/rss.cfm?mode=full.

This link gives us:

 

Figure 2 : RSS feed from Blog?s page

 

In other words, an XML document like any other. A complete description of RDF/RSS standard can be found at http://www.w3.org/RDF.

Let?s build the query object from this XML document. CFML coding of all used templates can be found at the bottom of this tutorial.

To write a CFML code doing this job, we?ll track the following steps:

 

Retrieving XML

We use <cfhttp> tag to retrieve the XML news document:

 
   <cfhttp url="http://www.forta.com/blog/rss.cfm?mode=full" method="GET">
 

This request results are stored in cfhttp.fileContent variable. It means this variable contains the news within an XML string. An output of this variable would be like Figure 2.

 

Converting a XML string to a XML object

In ColdFusion MX, there?s a new type of data: the XML object. Using XMLparse() function, we can convert an XML string into an XML object.

 
<cftry>
  <cfset xDoc = XMLparse(cfhttp.fileContent)>
  <cfcatch>
     Invalid RDF/RSS !
     <cfabort>
  </cfcatch>
   </cftry>
 

We use try...catch methodology to prevent from mal-formed XML. Dumping xDoc variable:

 

Figure 3 : xDoc XML object view

 

Identifying RSS version and searching for items

Once we have our XML object, xDoc, let?s identify to which RSS standard it belongs. To do this, we use the XML root element name ? xDoc.XmlRoot.XmlName.

In order to retrieve items from XML object ? the news themselves ? we use XMLsearch() function. It uses an XPath language expression to search an XML document and returns an array of XML object nodes that match the search criteria.

 
<cfswitch expression="#xDoc.XmlRoot.XmlName#">
  <cfcase value="rdf:RDF"><!--- Version 1.x --->
     <cfset arrItems = XMLSearch(xDoc, '/rdf:RDF/:item')>
  </cfcase>
  <cfcase value="rss"><!--- Version 0.9x --->
     <cfset arrItems = XMLSearch(xDoc, '/rss/channel/item')>
  </cfcase>
</cfswitch>
 

Each array element contains an XML object node <item></item>, which contains the elements title, description and link. We can see arrItems array in the next figure:

 

 

Figure 4 : arrItems array view

 

Creating query object

Now, we have news inside arrItems array elements. First, we create the query object, q_rss:

 
<cfset q_rss = queryNew("title, link, description")>
 

Looping over array elements, we get, for each item, text inside elements title, description and link.

 
<cfset n = arrayLen(arrItems)>
<!--- Loop over found items, populating query object --->
<cfloop index="i" from="1" to="#n#">
  <cfset queryAddRow(q_rss)>
  <cfset querySetCell(q_rss, "title", arrItems[i].title.xmlText,i)>
  <cfset querySetCell(q_rss, "link", arrItems[i].link.xmlText,i)>
  <cfset querySetCell(q_rss, "description", arrItems[i].description.xmlText,i)>
</cfloop>
 

Then, dumping q_rss:

Figure 5 : q_rss query ? final display

Creating a Query Object from a HTML news page

As we have seen, it?s fairly simple to create a query object from an RDF/RSS document. However, what if the RDF/RSS news document is not available? In other words, news are only available in HTML format.

The steps we?ll follow are essentially the same. But, as we don?t have the XML object, we won?t be able to use XMLsearch() function to retrieve items. Then, we have to search for items with another tool. How about Regular Expressions? They are quite helpful when searching patterns.

Let?s start working:

 

Retrieving HTML

We use <cfhttp> tag to retrieve the HTML news page:

 
   <cfhttp url="http://www.forta.com/blog" method="GET">
 

This request results are stored in cfhttp.fileContent variable. It means this variable contains the news within an HTML string. This string is, then, stored in sDoc variable.

 
<cfset sDoc = cfhttp.fileContent>
 

Creating Regular Expression

The hardest part of our job is to build a regular expression that matches Ben Forta?s HTML news text. We highly recommend you to use a regular expressions tester tool, which tests them as long as they are created.

At the bottom of this tutorial, an HTML application ? REtest.htm ? is given. It will help you when creating regular expressions.

Using it, we get to the following regular expression:

 
<cfset regExp = '<font color="336633"><b>([\s\S]*?)</b></font>[\s\S]*?
                 <font size="-1">([\s\S]*?)</font>[\s\S]*?
                 <a href="(index\.cfm\?mode=e&entry=[0-9]*?)">'>
 

Sub expressions ? terms inside parenthesis ? represent title, description and link to each item. Note that there are links to sub expressions as well as to next occurrences of search pattern.

Figure 6 : REtest.htm

 

Creating query object

First, we create the query object, q_rss:

 
<cfset q_rss = queryNew("title, link, description")>
 

We use, now, REfindNoCase() function to search sDoc text for the regular expression specified before. Note that the function call is nested in a loop, which tests the function return, through start variable.

As seen before, sub expressions title, description and link can be found in this order. Therefore, they match to positions 2, 3 and 4 pos and len arrays. These arrays are keys of the stResult structure, returned by REfindNoCase() function.

 
<cfset start = 1>
<cfloop condition="#start#">
  <cfset stResult = REfindNoCase(regExp,sDoc,start,"Yes")>
  <cfif stResult.pos[1]>
     <cfset queryAddRow(q_rss)>
     <cfset querySetCell(q_rss,"title",mid(sDoc,stResult.pos[2],stResult.len[2]))>
     <cfset querySetCell(q_rss,"link",mid(sDoc,stResult.pos[4],stResult.len[4]))>
     <cfset querySetCell(q_rss,"description", 
       mid(sDoc,stResult.pos[3],stResult.len[3]))>
  </cfif>
  <cfset start = stResult.pos[1] + stResult.len[1]>
</cfloop>
 

Checking results:

Figure 7 : q_rss query final display

 

As it was seen before.

 

CFML coding

 

rss2query.cfm

<!--- Retrieve RSS data from Ben Forta's blog--->
<cfhttp url=&qu
                

All ColdFusion Tutorials By Author: Claudio Dias
  • Build a List, Get a Tree
    How to create an outlined tree from <ul> and <li> tags, CSS and JavaScript
    Author: Claudio Dias
    Views: 10,919
    Posted Date: Tuesday, November 9, 2004
  • Building a Suggest List with XMLHttpRequest
    Avoid huge dropdowns! This tutorial shows how to dinamically create a suggest list as long as the user fills a form field. Like Google Suggest!
    Author: Claudio Dias
    Views: 18,094
    Posted Date: Friday, March 11, 2005
  • Building an Editable Grid with AJAX and ColdFusion Components
    This tutorial shows how to change an HTML table into an editable grid using JavaScript, XML and ColdFusion Components. No more page reloads! Its easy! Its simple!
    Author: Claudio Dias
    Views: 16,837
    Posted Date: Friday, September 16, 2005
  • Feeding a Query
    A simple approach on how to create a query object from RDF/RSS feed. Further, when news are in HTML format, how to get them using Regular Expressions.
    Author: Claudio Dias
    Views: 8,802
    Posted Date: Tuesday, May 25, 2004
  • Nested Custom Tags
    This tutorial explains how to use nested custom tags in ColdFusion. It also presents one 'basic quiz' in CF MX where nested custom tags show an interesting behavior
    Author: Claudio Dias
    Views: 10,484
    Posted Date: Monday, June 23, 2003