|
• Data Extraction
Hello All,
I am trying to build an HTML intelligent parser(web agent) that is able extract meaningful data from web pages (HTML). I have tried many traditional(non-intelligent) methods, like using HTML simple parsers which only remove the HTML tags and this leaves me all the information. I have tried Regular Expressions but it gave poor since the data is displayed presentation varies greatly from one page to another. I finally realised that my agent has got to have some intelligence, my knowledge about neural networks is very basic and therefore I couldnt tell whether or not what I am trying to do is possible using neural networks and AI or not.
Below is the case that I am trying to solve:
I am looking for houses I need the agent to access many(100+) mortgage sites and out of each site extract the following data: house address, price, size, and number of rooms, assuming this is the data required for all houses and I have the pages for the mortgage sites stored on my machine, is neural networks and AI suitable to solve my problem ?
Suggesstions are welcomed.
Regards,
Hussam Galal
hussam.galal@gmail.com
|