Project Description:
I need a skilled Python developer to create a web service that automates CPF (Cadastro de Pessoas Físicas) status queries on the Brazilian Revenue Service website (
https://servicos.receita.fazenda.gov.br/Servicos/CPF/ConsultaSituacao/ConsultaPublica.asp). The service must integrate a third-party CAPTCHA-solving solution (CaptchaSonic, CaptchaAI, or NoCaptchaAI) to bypass the reCAPTCHA v2 used on the site. The web service will accept cpf and date of birth as input parameters, process the request in under 5 seconds, and save the resulting html page to a file on the server, returning the file path. It must handle at least 100 requests per minute efficiently.
A previous attempt by another developer was too slow and couldn’t meet the performance needs. This solution must be fast, reliable, and optimized for high throughput.
Requirements:
1 Core Functionality:
◦ Build a web service (e.g., Using Flask or FastAPI) that accepts HTTP requests with two parameters:
▪ cpf: A string with 11 digits (e.g., “12345678901”).
▪ Date_of_birth: A string in the format “dd/mm/yyyy” (e.g., “01/01/1990”).
◦ The service must access the target url, fill in the cpf and date of birth, solve the recaptcha v2, submit the form, and retrieve the resulting html page.
◦ Save the HTML response to a unique file on the server (e.g., Named with a timestamp or UUID, like
cpf_12345678901_20250407.html).
◦ Return a JSON response with the file path (e.g., {"File_path": "/path/to/saved/
file.html"}).
2 CAPTCHA Solving Integration:
◦ Integrate one of the following CAPTCHA-solving services (developer’s choice, justify the recommendation):
▪ CaptchaSonic (
https://captchasonic.com/extension): Preferred if it meets speed requirements.
▪ CaptchaAI (
https://captchaai.com/): Supports reCAPTCHA v2 with unlimited solves and thread-based pricing.
▪ NoCaptchaAI (
https://dash.nocaptchaai.com/marketplace?plan=daily6k): Offers fast API-based solving.
◦ Use the service’s API to send the reCAPTCHA site key and URL, retrieve the solved token, and submit it with the form.
◦ Handle API errors (e.g., Rate limits, timeouts) with retries (max 2 retries per request).
3 Performance:
◦ Each request (including CAPTCHA solving, form submission, and file saving) must take 5 seconds or less.
◦ The service must support 100 requests per minute using asynchronous processing (e.g., Asyncio) or multithreading.
◦ Optimize HTTP requests (e.g., Session persistence with cookies) to reduce overhead.
4 Input/Output:
◦ Input: http get or post request with parameters cpf and date_of_birth (e.g., POST /query?cpf=12345678901&date_of_birth=01/01/1990).
◦ Output:
▪ Save the full HTML response to a file in a server directory (e.g., /Output/cpf_responses/).
▪ Return a JSON response: {"file_path": "/output/cpf_responses/
cpf_12345678901_20250407.html"}.
◦ Include error logging (e.g., To a .log file) for failed requests (e.g., Invalid cpf, captcha failure).
5 Technical Details:
◦ Use Python 3.9+ with libraries like requests (for HTTP), selenium (if needed for dynamic interaction), and a web framework (e.g., Flask or FastAPI).
◦ Maintain session cookies as required by the site (noted in the documentation: browser must allow cookies).
◦ Include a configuration file (e.g., .Env or
config.json) for API keys, output directory, and thread settings.
◦ Ensure the service runs on a local server (e.g., Localhost:5000) for testing, with instructions for deployment.
6 Deliverables:
◦ Fully functional Python web service with clear comments and documentation.
◦ A README file with setup instructions (e.g., Installing dependencies, configuring API keys, running the service).
◦ A sample HTTP request example (e.g., Using curl or Postman) and expected response.
◦ Brief explanation of the chosen CAPTCHA service and why it was selected.
7 Constraints:
◦ Must not exceed 5 seconds per request on average (tested with a stable internet connection).
◦ Must handle CAPTCHA-solving service rate limits and provide a fallback (e.g., Pause and retry).
◦ Should avoid triggering anti-bot measures on the target site (e.g., Random delays of 1-2 seconds if needed).
◦ Ensure file naming avoids overwrites (e.g., Use timestamps or unique IDs).
Preferred Skills:
• Strong experience with Python (web frameworks like Flask/FastAPI, asyncio, or multithreading).
• Familiarity with web scraping and CAPTCHA-solving APIs (e.g., 2Captcha, CaptchaAI, or similar).
• Knowledge of reCAPTCHA v2 mechanics and browser automation (Selenium or Puppeteer, if required).
• Ability to optimize code for speed and scalability.
Budget and Timeline:
• Budget: $150–$250 (negotiable based on experience and proposed solution).
• Timeline: 5–7 days from project start.
How to Apply:
Please submit:
1 A brief proposal explaining your approach, including which CAPTCHA service you’ll use and how you’ll ensure the 100 requests/minute target with a 5-second response time.
2 Examples of similar projects (e.g., Web services, scraping, or CAPTCHA automation).
3 Estimated time to complete and your rate.
Delivery term: Not specified