Requirements

Windows 2000/NT/ME/XP/Vista/7/8
32 MB RAM
7 MB Hard Disk Space
Internet Connection

What are people saying?

Student License

Recently I was looking for a url extractor program because I am building a search engine and I have tried near 20 of them, all with some missing features. But WDE seems to be the best one right now. I am just a student and the normal price is way high for me. Do you have any student license to offer? I would really appreciate it.

Best Regards, Jeffrey Richardson, Sweden

 

Web Addresses

We downloaded and ran the trial version of your web link extractor. I compared it to another email extractor program and yours kicked it's butt. Your's scanned 9000 files while finding over 1500 links vs. the other only scanned 1200 file, and found only about 400 links. (This was using the exact same search file).

Thanks, Mark Jeter

 

Email List Management

A perfect tool for email marketing mailing lists creation, processing and management.
ListMotor

FAQ

I set-up a project with Web Site extraction - but no page was processed? WDE can not connect?

I set-up a project with "URLs from File" extraction, enter the filename - but WDE can not find any link in the file?

When I run WDE, it sucks all my computer power, screen is hardly refreshing?

Can I resume an interrupted session in WDE?

How I can add search engine listing other than those specified in Engine Listing dialog for specific data mining tasks?

What are inactive sites shown in data tab?

Why the extractor slow down after running whole day?

How to get more data in WDE? When I query in search engine I see million of matches.

I need to be able to get into a message board community that is username/password protected and get every email there. Can your product do this effectively?

Should I use more thread to complete the session quickly?

 

I set-up a project with WebSite extraction - but no page was processed? WDE can not connect?

There are several things that may cause this:
(1) Check your Internet connection - you must be online.
(2) Check your proxy settings. If you are behind a firewall / proxy server, you need to enter necessary information in the "New Session Dialog - Proxy" tab. If you do not know proxy data then contact your ISP / system administrator.
(3) Is the site password protected? You can not extract data from protected sites.
(4) Make sure the site is not down temporarily/permanently. You can check it using your default browser. Your default browser can load it?
(5) Is the site using some type of redirect system. That is you enter a URL like http://www.car.com and the server redirects to http://www.truck.com. In that case, you need to use http://www.truck.com as your starting address in "New Session" dialog.
(6) Check you didn't use any exclude URL filter like "/" or "com" in "New Session Dialog - URL Filter" which will prevent WDE to process all sites.
(7) Check the site doesn't use only a Java applet in the home / index page. Like other spider, WDE can not parse Java applet.
(8) WDE doesn't support secured https:// protocol.
(9) Finally, did you use a very low request time-out period in "New Session - Other" tab? The default time-out period is 100 secs. With a very lower value, WDE may stop the request before host sever reply.

I set-up a project with "URLs from File" extraction, enter the filename - but WDE can not find any link in the file?

Make sure the file exist in disk. The file must have URL line-by-line, other format is not supported, WDE will accept only lines that starts with http:// text. Also WDE will not accept URLs that point to image/binary files, because those files will not have any text data to extract.

When I run WDE, it sucks all my computer power, screen is hardly refreshing?

WDE can launch multiple threads simultaneously. But remember, too high a thread setting may be too much for your computer and/or internet connection to handle it and also puts an unfair load on the host server which may slow the process down.

Can I resume an interrupted session in WDE?

Yes. Use 'File - Open' menu command to open previously stopped session's log file.

How I can add search engine listing other than those specified in Engine Listing dialog?

It is easy. In "URL" field type the search query URL. Replace the search keyword part with WDE syntax {SEARCH_KEYWORD}

For Example: an AOL query URL with "Flower Shop" search is:
http://search.aol.com/dirsearch.adp?query=Flower+Shop

You just replace Flower+Shop part with {SEARCH_KEYWORD} like following:

http://search.aol.com/dirsearch.adp?query={SEARCH_KEYWORD}

After adding the new engine list, click "Save" button.

What are inactive sites shown in data tab?

WDE can not connect to these sites. The site could be down temporarily or domain expired. If you want to try these sites later then save the list using "Save" button and use "New Session Dialog - URLs from File" option to process these sites later.

Why the extractor slow down after running whole day?

Do not use many thread in New Session Dialog - Other tab. Use only 5 or less.

Also do not use it for very broad search because program uses RAM to store extracted url, email, etc... to avoid duplicate data and not to visit already visited site.. so this use lots of RAM and may slow up.

If you use for broad search then uncheck 'View - Display data in data tab' menu so no data will be shown in data tab and performance will increase.

Do not use 'Follow External Sites - Spider Unlimited Loop' in New Session Dialog. This way it can travel entire internet and crash easily.

How to get more data in WDE? When I query in search engine I see million of matches.

To get more results:

(1) Select all search engines - click Save in New Session Dialog -> Engine Listing Dialog.
(2) Use Intelligent Spidering Mode in External Site Tab.

Note that:
(1) Although you see millions of matches in search result, search engines do not deliver more than 1000 results. For example: try to view 1001 th result in any search engine.

(2) You will see some similar programs showing huge emails. They are not actual, targeted but convincing to purchase. Always check out the source of extracted emails.

I need to be able to get into a message board community that is username/password protected and get every email there. Can your product do this effectively?

What kind of authentication used in the site?
It is possible for password protected directory. (Enter login info in New Session Dialog - Login tab.)
If it is like http://mail.yahoo.com/ then not possible.

Should I use more thread to complete the session quickly?

It is correct for a smaller session which will complete within few hours.
But for large scale sessions that will take many hours, use low thread (say 5).

Thread used to download data simultaneously.
Its not right that - more thread means faster extraction. Because after data download, program needs to analyze, parse the data to extract email, phone, .. and get inside links for further extraction, etc.... So more thread you use, the program and CPU will become more and more busy. You should use 10 for smaller session 5 for large session