| Search Keyword | WebSite | Web Directories | List of URLs from File
Key words: WDE spiders top Search engines for
right web sites and get data from them.
Quick Start:
Select "Search Engines" source - Enter keyword
- Click OK
What WDE Does:
WDE will query top popular search engines, extract
all matching URLs from search results, remove duplicate URLs and finally
visits those web sites and extract targeted data from there.
WDE is an advanced website extractor! You can set WDE how many search engines to use. Click "Engines" button
and uncheck listing that you do not want to use. You can add other engine
sources as well.
WDE send queries to search engines to get matching
website URLs. Next it visits those matching websites for data (url, meta
tag, email, phone, fax etc..) extraction.
How many deep it spiders in the marching websites depends on "Depth" setting
of "External Site" tab.
DEPTH: Here you need to tell WDE - how many levels to dig down within
the specified website. If you want WDE to stay within first page, just
select and enter Process 1 page Only. A setting of depth "0" will process and look
for data in whole website. A setting of "1" will process index or home
page with associated files under root dir only.
For example: WDE is going to visit URL http://www.xyz.com/product/milk/ for data extraction.
Lets say www.xyz.com has following
text/html pages:
- http://www.xyz.com/
- http://www.xyz.com/contact.htm
- http://www.xyz.com/about.htm
- http://www.xyz.com/product/
- http://www.xyz.com/product/support.htm
- http://www.xyz.com/product/milk/
- http://www.xyz.com/product/water/
- http://www.xyz.com/product/milk/baby/
- http://www.xyz.com/product/milk/baby/page1.htm
- http://www.xyz.com/product/milk/baby/page2.htm
- http://www.xyz.com/product/water/mineral/
- http://www.xyz.com/product/water/mineral/news.htm
WDE is powerful and fully featured unique
spider robot!
You need to decide how deep you want WDE to spider and look for data.
| WDE can spider: |
Set depth options: |
| Only matching URL page of search
( URL #6 ) |
Select "Process 1 Page Only" |
| Entire milk dir (URL #6 - 10 ) |
Select "Depth=0" and check "Stay
within Full URL" |
| Entire www.xyz.com site |
Select "Depth=0" |
| Only www.xyz.com page |
Select "Process 1 age Only"
and
select "Spider Base URL Only" |
| Only root dir file (URL #1 - 3)
|
Select "Depth=1" |
| Only URL #1 - 5 |
Select "Depth=2" |
Stop Site on First Email Found:
Each web site is structured differently on the
server. Some web sites may have only few files and some may have thousands
of files. So sometimes you may prefer to use "Stop Site on First Email
Found" option. For example: you set WDE to go entire www.xyz.com site and WDE
found email in #2 URL (contact.htm) . If you tell WDE to "Stop Site on
First email Found then it will not go for other pages (#3-12)
Spider Base URL Only:
With this option you can tell WDE to process always
the Base URLs of external sites. For example: in above case, if an external
site found like http://www.xyz.com/product/milk/ then WDE will grab only base
www.xyz.com. It will not
visit http://www.xyz.com/product/milk/
unless you set such depth that covers also milk
dir.
Web Sites: Enter website starting URL and extract all data found in
that website.
Quick Start:
Select 2nd option "WebSite/Dir/Groups" - Enter
website URL - Select Depth - Click
OK
What WDE Does:
WDE will retrieve html/text pages of the website
according to the Depth you
specified and extract all data found in those pages.
| # By default, WDE will stay only
the current domain. |
| # WDE can also follow external sites!
If you want WDE to retrieve files
of external websites that are linked from starting site specified
in New Session Dialog - General tab, then you need to set "Follow External URLs"
of "External Site" tab. In this case, by default, WDE will follow
external sites only once, that is - (1) WDE will process starting
address and (2) all external sites found in starting address.
It will not follow all external sites found in (2) and so on...
WDE is powerful website extractor
spider, if you want WDE
to follow external sites with unlimited loop, select "Unlimited"
in "Spider External URls Loop" combo box, and remember you need
to manually stop WDE session, because this way WDE can travel
entire internet. |
Directories: Choose yahoo, domz or other directory and get
all data from there.
Quick Start & What WDE Does:
Lets say you want to extract data of all companies
listed at
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Marketing_and_Advertising/
Action #1A:
Select 2nd option "WebSite/Dir/Groups" - enter this URL in
"Starting Address" box - select "Process First Page Only"
Or, lets say you want to extract data of all
companies listed at
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Marketing_and_Advertising/
plus all down level folders like
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Marketing_and_Advertising/Advertising/
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Marketing_and_Advertising/Target_Marketing/
etc....
Action #1B:
Select 2nd option "WebSite/Dir/Groups" - enter
URL
http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Marketing_and_Advertising/
in "Starting Address" box - select Depth=0 and
"Stay within Full URL" option.
With these actions WDE will download http://dir.yahoo.com/Business_and_Economy/Business_to_Business/Marketing_and_Advertising/
page and optionally all down level pages and
will build a URLs list of companies listed there.
Now you want WDE to visit all those URLs and
extract all data found in those sites.
Action #2:
So after either above action you must move to
"External Site" tab and check "Follow External URLs" option. (Remember:
this setting tells WDE to process/follow/visit all URLs found while processing
"Starting Address" of "General" tab).
List of URL:
Enter hundreds/thousands of
URLs to spider, extract data found on those websites.
Quick Start:
Select 3rd option Source "URLs from File" - Enter file
name that contains all URLs list - Select Depth - Click
OK
What WDE Does:
WDE will scan the contents of specified file.
This file must have URL line-by-line, other format is not supported, WDE
will accept only lines that starts with http:// text. Also it
will not accept URLs that point to image/binary files, because those files
will not have any data.
After building unique URL list form above file, WDE will process website
one-by-one according to the depth you
specify. |