Phone, Fax Harvester Module:
WDE - Phone, Fax Harvester
module is designed to spider the web for fresh Tel, FAX numbers targeted
to the group that you want to market your product or services to. There
are millions of websites on the internet today and most of which are businesses
that list their telephone, fax number as a point of contact. WDE can extract
tel, fax numbers from website, search results, web directories, list of
urls from local file. It specializes in harvesting tel, fax numbers from
web.
It has various limiters of scanning range - url filter, page text filter,
phone/fax filter, domain filter - using which you can extract only the
data you actually need from web pages, instead of extracting all the phone,
fax present there, as a result, you create your own custom and targeted
data base of phone/fax collection. You can specify
various filters to help insure that the phone, fax numbers harvested are
extremely targeted to your market.
A powerful phone, fax harvester/extraction tools for responsible tel/fax
marketing.
Screen Shot:

You can setup different type of
extraction with this UNIQUE phone/fax harvester:
Search Keyword
| WebSite | Web Directories
| List of URLs from File
Key
words: WDE
spiders 18+ Search engines for right web sites and get phone, fax data
from there.
Quick Start:
Select "Search Engines" source
- Enter keyword - Click OK
What WDE Does:
WDE will query 18+ popular search
engines, extract all matching URLs from search results, remove duplicate
URLs and finally visits those websites and collect phone, fax data from
there.
You can tell WDE how many search engines to use. Click "Engines"
button and uncheck listing that you do not want to use. You can add other
engine sources as well.
WDE send queries to search engines to get
matching website URLs. Next it visits those matching websites for data
extraction. How many deep it spiders in the matching websites depends
on "Depth" setting of "External Site" tab.
DEPTH:
Here you need to tell WDE harvester
- how many levels to dig down within the specified website. If you want
WDE to stay within first page, just select "Process First Page Only".
A setting of "0" will process and look for data in whole website.
A setting of "1" will process index or home page with associated
files under root dir only.
For example: WDE is going to visit
URL http://www.xyz.com/product/milk/
for data extraction.
Lets say
www.xyz.com
has following text/html pages:
- http://www.xyz.com/
- http://www.xyz.com/contact.htm
- http://www.xyz.com/about.htm
- http://www.xyz.com/product/
- http://www.xyz.com/product/support.htm
- http://www.xyz.com/product/milk/
- http://www.xyz.com/product/water/
- http://www.xyz.com/product/milk/baby/
- http://www.xyz.com/product/milk/baby/page1.htm
- http://www.xyz.com/product/milk/baby/page2.htm
- http://www.xyz.com/product/water/mineral/
- http://www.xyz.com/product/water/mineral/news.htm
WDE is powerful and fully featured unique
phone, fax collector spider! You need to decide how deep you want WDE
to look for data.
| WDE can retrieve: |
Set options: |
| Only matching URL page of search
( URL #6 ) |
Select "Process First Page
Only" |
| Entire milk dir (URL #6 - 10
) |
Select "Depth=0" and
check "Stay within Full URL" |
| Entire
www.xyz.com
site |
Select "Depth=0" |
| Only
www.xyz.com
page |
Select "Process First Page
Only" and
check "Spider Base URL Only" |
| Only root dir file (URL #1 -
3) |
Select "Depth=1" |
| Only URL #1 - 5 |
Select "Depth=2" |
Spider Base URL Only:
With this option you can tell WDE
to process always the Base URLs of external sites. For example: in above
case, if an external site found like
http://www.xyz.com/product/milk/
then WDE will grab only base
www.xyz.com.
It will not visit http://www.xyz.com/product/milk/
unless you set such depth that covers also milk dir.
Ignore Case of URLs:
Set this option to avoid duplicate
URLs like
http://www.xyz.com/product/milk/
http://www.xyz.com/Product/Milk/
These 2 URLs are same. When you set to ignore URLs case, then WDE convert
all URLs to lowercase and can remove duplicate URLs like above. However
- some servers are case-sensitive and you should not use this option on
those special sites.
WebSites:
Enter website
URL and collect all phone, fax data found in that web site.
Quick Start:
Select 2nd option "WebSite/Dir/Groups"
- Enter website URL - Select Depth
- Click OK
What WDE Does:
WDE extractor will retrieve html/text
pages of the website according to the Depth
you specified and extract all phone, fax found in those pages.
| # By default, WDE will stay
only the current domain. |
| # WDE can also follow external
sites! If you want WDE
to retrieve files of external sites that are linked from starting
site specified in "General" tab, then you need to
set "Follow External URLs" of "External Site"
tab. In this case, by default, WDE will follow external sites
only once, that is - (1) WDE will process starting address and
(2) all external sites found in starting address. It will not
follow all external sites found in (2) and so on...
WDE is powerful phone, fax collector!
If you want WDE to follow external sites with unlimited loop,
select "Unlimited" in "Spider External URls Loop"
combo box, and remember you need to manually stop WDE session,
because this way WDE can travel entire internet. |
Directories:
Choose yahoo,
google, dmoz or other directory and get all phone, fax contact data from
there.
Quick Start & What WDE
phone/fax harvester Does:
Lets say you want to extract data of all
companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
Action #1A:
Select 2nd option "Web Site/Dir/Groups"
- enter this URL in "Starting Address" box - select "Process
First Page Only"
Or, lets say you want to extract data of
all companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
plus all down level folders like
http://directory.google.com/Top/Computers/Software/Freeware/windows
http://directory.google.com/Top/Computers/Software/Freeware/windows/browser
http://directory.google.com/Top/Computers/Software/Freeware/linux
etc....
Action #1B:
Select 2nd option "WebSite/Dir/Groups"
- enter URL http://directory.google.com/Top/Computers/Software/Freeware/
in "Starting Address" box - select Depth=0 and "Stay within
Full URL" option.
With these actions WDE harvester will download
http://directory.google.com/Top/Computers/Software/Freeware/
page and optionally all down level pages and will build a URLs list of
companies listed there.
Now you want this harvester to visit all
those URLs and extract all data found in those sites.
Action #2:
So after either above action
you must move to "External Site" tab and check "Follow
External URLs" option. (Remember:
this setting tells WDE fax harvestor to process/follow/visit all URLs
found while processing "Starting Address" of "General"
tab).
List
of URL: Enter
hundreds/thousands of URLs to extract phone, fax found on those
web sites.
Quick Start:
Select 3rd option "URLs from
File" - Enter file name that contains all URLs list - Select Depth
- Click OK
What WDE Does:
WDE will scan the contents of specified
file. This file must have URL line-by-line, other format is not supported,
WDE will accept only lines that starts with http://
text. Also it will not accept URLs that point to image/binary files, because
those files will not have any data.
After building unique URL list form above file, WDE will process website
one-by-one according to the depth
you specify.
Frequently Asked Questions
Q: How can I set fax / phone filter?
A: Go to New Session dialog
- Filters - Data Filter box and use this Fax/Phone filter option to extract
country / area specific fax/phone numbers only. For example: you want
to extract fax starting with US area code '866' , so enter '866' and '1866'
line by line in Fax filter box, like:
866
1866
Why 1866 ? because some fax in the web page may appear starting with
"1" like 1-866-455-344 or +1.866.455.344
(Please note: you
MUST NOT use '-' or '.' or () chars in
fax / phone filter because WDE compares filter against formatted numbers
that contain no special symbols but they are only numbers. )
Q: Does this extractor require
'Internet Explorer'?
A: No. It doesn't require
any third party software/library.
Q: I
set-up a project with "URLs from File" extraction, enter the
filename - but WDE can not find any link in the file?
A: Make sure the file exist in disk. The file
must have URL line-by-line, other format is not supported, WDE will accept
only lines that starts with http:// text. Also WDE will not accept URLs
that point to image/binary files, because those files will not have any
text data to extract.
Q: When
I run WDE link extractor, it sucks all my computer power, screen is hardly
refreshing?
A: It seems you are using high number of threads.
Decrease the thread value to "5" in "New Session - Other"
tab. WDE can launch multiple threads simultaneously. But remember, too
high a thread setting may be too much for your computer and/or internet
connection to handle it and also puts an unfair load on the host server
which may slow the process down.
Q: Can
I resume an interrupted session in WDE?
A: Yes. Use 'File - Open' command to open a previously
stopped session.
Q: How I can add search
engine listing other than those specified in Engine Listing dialog?
A: It is easy. In "URL" field type
the search query URL. Replace the search keyword part with WDE syntax
{SEARCH_KEYWORD}
For Example: an AOL query URL with "Flower Shop" search is:
http://search.aol.com/dirsearch.adp?query=Flower+Shop
You just replace Flower+Shop part
with {SEARCH_KEYWORD} like following:
http://search.aol.com/dirsearch.adp?query={SEARCH_KEYWORD}
After adding the new engine list, click
"Save" button.
Download
| Purchase
| Links
|