Converting HTML to Xaml

My Windows 8 App whose code can be found here, uses a 3rd party API (Readability) to scrape the news sites such as Yahoo or CNN, so that the user sees only the text of the article and nothing more.  Ads and images are removed.

I am using the HTML Agility Pack to parse HTML and convert it to Xaml or a WinRT Visual Tree for display.  Right now, it’s all text.  The content of each paragraph, the <p> tag is  pulled out and placed into a TextBlock as a Run.  A style is applied to the TextBlock, so that the font is bigger than default, but nothing much else.

Now, I am working to add Images to the article view.  Images were easy to add, but captions are proving a more formidable obstacle.  Some sites use the alt attribute on the img tag to represent a caption.  Some also add a paragraph after the image to act as a caption.  Some do both. 

But some do neither.  So I cannot assume that the next paragraph after the image is the caption.  If anyone can figure out the solution to this problem, comment on this or email me feinbergaa at yahoo dot com
My

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s