scrapoxy

Scrapoxy hides your webscraper behind a cloud.

It starts a pool of proxies to relay your requests.

Now, you can crawl without thinking about blacklisting!

It is written in ES6 (Node.js and AngularJS)
and it is open source!

Install the latest version of Scrapoxy
Scrapoxy 3.1.1
$ npm install -g scrapoxy
and join the Slack community !

How does it work ?

Howto

Features

a Personal Proxy
  • Create your own proxies
  • Use multiple cloud providers
  • Save your money ($$$)
with Anti-Blacklisting
  • Rotate IP addresses
  • Impersonate known browsers
  • Exclude blacklisted instances
and Statistics
  • Monitor the requests
  • Detect bottleneck
  • Optimize the scraping

Get started with AWS/EC2

1
Run the Docker image
$ sudo docker run -e COMMANDER_PASSWORD='CHANGE_THIS_PASSWORD' \
-e PROVIDERS_AWSEC2_ACCESSKEYID='YOUR ACCESS KEY ID' \
-e PROVIDERS_AWSEC2_SECRETACCESSKEY='YOUR SECRET ACCESS KEY' \
-it -p 8888:8888 -p 8889:8889 fabienvauchelles/scrapoxy
Don’t forget to change the password, the accessKeyId and the secretAccessKey!

Like the project ?

Add a star on Github !

Contribute ?

You can open an issue on Github for any feedback
(bug, question, request, pull request, etc.)
or fork the project.

More info ?

Check the documentation !
And join the Slack community:

The Scrapoxy project is developped by Fabien Vauchelles.