|
|||||||||
|
|
|||||||||
How
To
|
|
||||||||
|
(1) I want to extract contact data of travel related companies. Go to New Session Dialog Select "Source = Search Engines" Enter travel in Keyword Box Select what type of data you want to extract (email, phone, fax, ...) Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV or line by line Click OK button (2) I want to extract contact data of travel related companies of Australia. Repeat (1) but select "Engine = Australia" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button of New Session - General Tab. By default US/International Engines are selected. (3) I want to get more data of travel related companies of Australia. Repeat (2) but use more keywords like travel (4) I want to extract all data from a website. Go to New Session Dialog Select "Source = WebSite" Enter website URL in Starting Address box: like http://www.mydomain.com Select depth = 0 (to spider entire website , see more about depth here) Select what type of data you want to extract (email, phone, fax, ...) Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV or line by line Click OK button (5) I want to extract all photographers contact data from yahoo dir Photographers to send them invitation to visit my new photographer forum. Go to New Session Dialog Select "Source = WebSite" Enter website URL in Starting Address box: like http://dir.yahoo.com/Arts/Visual_Arts/Photography/Photographers/ Select depth = 0 ; Check
"Stay within Full URL" Select what type of data you want to extract (email, phone, fax, ...) Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV or line by line Now go to External Site tab
- select "Follow External URLs" - Select Spidering Mode
(Intelligent or you define depth) Now back to General tab and Click OK button (6) I have a list of urls in a file and I want to extract data from those urls. Go to New Session Dialog Select "Source = URLs from File" Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line. Select Depth = 0 for entire website extraction of each website located in the text file or select "process 1 page only" to spider only the specified url. Select what type of data you want to extract (email, phone, fax, ...) Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV or line by line Click OK button
(7) I want to compile a list of offshore, banking, tax related websites that do link exchange with other sites. Go to New Session Dialog Select "Source = Search Engines" Generate Keywords using following 2 lists:
Select Extract Meta Tag and Extract Email Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV Now go to External Site Tab. Select "Follow External URL". Select Spidering Mode = I will Select the Depth. Select "Process 1 Page Only". Select "Spider Base URL only" Now go to Filters - Text Filters tab. Check "page must contain following text" . Enter following string in the box: links.htm
Now back to General tab and Click OK button. After extraction completed, go to Data Tab - Meta Tag list. These are the related sites that do link exchange with other sites. (8) I want to extract real estate companies phone / fax numbers of Canada, Ontario area. Go to New Session Dialog Select "Source = Search Engines" Enter real estate in Keyword Box Select "Engine = Canada" from Engine Listing Dialog. You can lunch this dialog by clicking "Engines" button Select Extract phone, fax Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV or line by line Now go to Filters Tab - Data
Filters - Phone/Fax box. Enter Now back to General tab and Click OK button. Click OK button (9) I want to build a domain list of health/medicine related websites. Go to New Session Dialog Select "Source = Search Engines" Enter following Keywords: Select Extract URL (select Base URL) Select "Save Data" folder , i.e. where program will save the data Select Save Format - line by line Click OK button (10) I have url list in a SQL database. I want to extract url, title, description, keyword, plain page text of html <BODY> to </BODY> and merge them into database. WDE can not access SQL database. You need to export url list from SQL database to a plain text disk file, and use this file in WDE. Go to New Session Dialog Select "Source = URLs from File" Enter url file path in File name box. This file must be plain text file with one URL per line and starting with http:// string each line. Select "process 1 page only" to extract meta tag of specified root domain. If you need to extract meta tag of ALL pages of each website then select depth=0 Select Extract Meta tag, Extract Body (you can set text size limit by clicking ... button) Select "Save Data" folder , i.e. where program will save the data Select Save Format - CSV Uncheck "View - Display data in data tab" for very large URL Meta tag extraction, so that WDE will not display data within program but will write directly to disk file - this will surely increase program performance. Click OK button After extraction completed, you can import this csv file (metatag.txt) to SQL databse and do further processing, query, etc...
|
||||||||