Email
Extractor Module:
WDE Email Extractor module is designed
to extract e-mail addresses from web-pages, search results, web dirs/groups,
list of urls from local file. It is an industrial strength, fast and reliable
way to collect email addresses from the Web.
It has various limiters of scanning range - url filter, page text filter,
email filter, domain filter - using which you can extract only the addresses
you actually need from web pages, instead of extracting all the addresses
present there, as a result, you create your own custom and targeted bulk
email list. It has option to save extracted emails directly to disk file,
so there is no limit in number of email extraction per session. It supports
operation through proxy-server and works very fast, as it is able of loading
several pages simultaneously, and requires very few resources. Powerful,
highly targeted email spider harvester.
Screen Shot:

You can setup different type of
email extraction with this UNIQUE spider,
email collector:
Search Keyword
| WebSite | Web Directories
| List of URLs from File
Key
words: WDE
Email Extractor spiders 18+ Search engines for right web sites and get
email data from there.
Specify key words of your search and the
email spider brings you hundreds of addresses from web sites found by
the search engine. This email address extractor supports all
major search engines but has unique ability to add any popular search
engine or websites that has search option. You can define your search
by yourself and as many search engines as you want!
Quick Start:
Select "Search Engines" source
- Enter keyword - Click OK
What WDE Does:
WDE email collector will query 18+
popular search engines, extract all matching URLs from search results,
remove duplicate URLs and finally visits those websites and collect email
addresses from there.
You can tell WDE how many search engines to use. Click "Engines"
button and uncheck listing that you do not want to use. You can add other
engine sources as well.
WDE send queries to search engines to get
matching website URLs. Next it visits those matching websites for email
extraction. How many deep this email address extractor spiders in the
matching websites depends on "Depth" setting of "External
Site" tab.
DEPTH:
Here you need to tell the WDE email
extractor - how many levels to dig down within the specified website.
If you want WDE to stay within first page, just select "Process First
Page Only". A setting of "0" will process and look for
data in whole website. A setting of "1" will process index or
home page with associated files under root dir only.
For example: WDE is going to visit
URL http://www.xyz.com/product/milk/
for data extraction.
Lets say
www.xyz.com
has following text/html pages:
- http://www.xyz.com/
- http://www.xyz.com/contact.htm
- http://www.xyz.com/about.htm
- http://www.xyz.com/product/
- http://www.xyz.com/product/support.htm
- http://www.xyz.com/product/milk/
- http://www.xyz.com/product/water/
- http://www.xyz.com/product/milk/baby/
- http://www.xyz.com/product/milk/baby/page1.htm
- http://www.xyz.com/product/milk/baby/page2.htm
- http://www.xyz.com/product/water/mineral/
- http://www.xyz.com/product/water/mineral/news.htm
WDE is powerful and fully featured UNIQUE
email spider! You need to decide how deep you want WDE to look for data.
| WDE can retrieve: |
Set options: |
| Only matching URL page of search
( URL #6 ) |
Select "Process First Page
Only" |
| Entire milk dir (URL #6 - 10
) |
Select "Depth=0" and
check "Stay within Full URL" |
| Entire
www.xyz.com
site |
Select "Depth=0" |
| Only
www.xyz.com
page |
Select "Process First Page
Only" and
check "Spider Base URL Only" |
| Only root dir file (URL #1 -
3) |
Select "Depth=1" |
| Only URL #1 - 5 |
Select "Depth=2" |
Stop Site on First Email Found:
Each website is structured differently
on the server. Some websites may have only few files and some may have
thousands of files. So sometimes you may prefer to set this power email
extractor to "Stop Site on First Email Found" option. For example:
you set WDE to go entire
www.xyz.com
site and WDE found email in #2 URL (contact.htm) . If you tell this email
collector to "Stop Site on First email Found then it will not go
for other pages (#3-12)
Spider Base URL Only:
With this option you can tell WDE
to spider always the Base URLs of external sites. For example: in above
case, if an external site found like
http://www.xyz.com/product/milk/
then WDE will grab only base
www.xyz.com.
It will not visit http://www.xyz.com/product/milk/
unless you set such depth that covers also milk dir.
Ignore Case of URLs:
Set this option to avoid duplicate
URLs like
http://www.xyz.com/product/milk/
http://www.xyz.com/Product/Milk/
These 2 URLs are same. When you set
to ignore URLs case, then this e-mail extractor convert all URLs to lowercase
and can remove duplicate URLs like above. However - some servers are case-sensitive
and you should not use this option on those special sites.
WebSites:
Enter website
URL and grab all email found in that site.
Quick Start:
Select 2nd option "WebSite/Dir/Groups"
- Enter website URL - Select Depth
- Click OK
What WDE Does:
WDE will retrieve html/text pages
of the website according to the Depth
you specify and extract all data found in those pages.
| # By default, WDE will stay
only the current domain. |
| # WDE can also follow external
sites! If you want WDE
to retrieve files of external sites that are linked from starting
site specified in "General" tab, then you need to
set "Follow External URLs" of "External Site"
tab. In this case, by default, extractor will follow external
sites only once, that is - (1) WDE will spider starting address
and (2) all external sites found in starting address. It will
not follow all external sites found in (2) and so on...
WDE is powerful email extractor,
if you want WDE to follow external sites with unlimited loop,
select "Unlimited" in "Spider External URls Loop"
combo box, and remember you need to manually stop WDE sesssion,
because this way WDE can travel entire internet. |
Directories:
Choose yahoo,
google or other directory and extract all email from there.
Quick Start & What WDE
Does:
Lets say you want to grab email of all
companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
Action #1A:
Select 2nd option "Web Site/Dir/Groups"
- enter this URL in "Starting Address" box - select "Process
First Page Only"
Or, lets say you want to grab email addresses
all companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
plus all down level folders like
http://directory.google.com/Top/Computers/Software/Freeware/windows
http://directory.google.com/Top/Computers/Software/Freeware/windows/browser
http://directory.google.com/Top/Computers/Software/Freeware/linux
etc....
Action #1B:
Select 2nd option "Web Site/Dir/Groups"
- enter URL http://directory.google.com/Top/Computers/Software/Freeware/
in "Starting Address" box - select Depth=0 and "Stay within
Full URL" option.
With these actions the e-mail extractor
will download http://directory.google.com/Top/Computers/Software/Freeware/
page and optionally all down level pages and will build a URLs list of
companies listed there.
Now you want WDE to visit all those URLs
and extract all e-mail found in those sites.
Action #2:
So after either above action
you must move to "External Site" tab and check "Follow
External URLs" option. (Remember:
this setting tells WDE to spider/follow/visit all URLs found while processing
"Starting Address" of "General" tab).
List
of URL: Enter
hundreds/thousands of URLs to collect e-mail found on those sites.
Quick Start:
Select 3rd option "URLs from
File" - Enter file name that contains all URLs list - Select Depth
- Click OK
What WDE Does:
The email extractor will scan the
contents of specified file. This file must have URL line-by-line, other
format is not supported, WDE will accept only lines that starts with
http://
text. Also it will not accept URLs that point to image/binary files, because
those files will not have any email data.
After building unique URL list form above file, WDE will spider website
one-by-one according to the depth
you specify.
Frequently Asked Questions
Q: Does this email extractor require
'Internet Explorer'?
A: No. It doesn't require
any third party software/library.
Q: I
set-up a project with "Web Site" extraction - but no email was
found?
A: There are few things that may cause this:
(1) Not all website contains email address. Some website uses forms in
their contact / support pages. (If you are sure that the site has email
address in a page and WDE can not extract it, please tell us with page
location)
(2) Check "Depth" setting. Depth tells WDE how many levels
to dig down within the specified website.
Q: I set-up a project with "URLs from File" extraction, enter
the filename - but WDE can not find any link in the file?
A: Make sure the file exist in disk. The file
must have URL line-by-line, other format is not supported, WDE will accept
only lines that starts with http:// text. Also WDE will not accept URLs
that point to image/binary files, because those files will not have any
text data to extract.
Q: When
I run this e-mail extractor, it sucks all my computer power, screen is
hardly refreshing?
A: It seems you are using high number of threads.
Decrease the thread value to "5" in "New Session - Other"
tab. WDE can launch multiple threads simultaneously. But remember, too
high a thread setting may be too much for your computer and/or internet
connection to handle it and also puts an unfair load on the host server
which may slow the process down.
Q: Can
I resume an interrupted session in WDE?
A: Yes. Use 'File - Open' command to open a previously
stopped session.
Q: I run different sessions with different type of extraction,
WDE extracted huge emails. How I can sort the list and remove duplicate
emails, if any?
A: WDE email
extractor only removes duplicate emails found within a session. It doesn't
compare emails extracted in other previous sessions or exist in disk file.
To do this job, we suggest you Extract
Link software from Spadix.
Q: How
I can add search engine listing other than those specified in Engine Listing
dialog?
A: It is easy. In "URL" field type
the search query URL. Replace the search keyword part with WDE syntax
{SEARCH_KEYWORD}
For Example: an AOL query URL with "Flower Shop" search is:
http://search.aol.com/dirsearch.adp?query=Flower+Shop
You just replace Flower+Shop part
with {SEARCH_KEYWORD} like following:
http://search.aol.com/dirsearch.adp?query={SEARCH_KEYWORD}
After adding the new engine list, click
"Save" button.
Download
| Purchase
| No Spam | Links
|