Crawlee Integration
This tutorial uses the Crawlee web scraping framework Crawlee.
Step 1: Install the framework and setup a project
shell
npx crawlee create my-crawler
cd my-crawler
Step 2: Retrieve project credentials
- Open Scrapoxy User interface, and go to the project
Settings
; - Remember the project's
Username
; - Remember the project's
Password
.
Step 3: Create a crawler
Replace content of the file src/main.ts
by the following code:
typescript
import {
CheerioCrawler,
ProxyConfiguration
} from 'crawlee';
const proxyConfiguration = new ProxyConfiguration({
proxyUrls: [
'http://USERNAME:PASSWORD@localhost:8888',
]
});
const crawler = new CheerioCrawler({
proxyConfiguration,
ignoreSslErrors: true,
async requestHandler({body}) {
console.log(body.toString());
},
});
(async () => {
await crawler.run(['https://fingerprint.scrapoxy.io']);
})()
.catch(console.error);
Replace USERNAME
and PASSWORD
by the credentials you copied earlier.
Run the crawler:
shell
npm start