Jiri Karlik e35a5f4ce2 | ||
---|---|---|
jobs_aggregator | ||
jobs_dashboard | ||
.gitignore | ||
LICENSE | ||
Procfile | ||
README.md | ||
manage.py | ||
requirements.txt |
README.md
Jobs_aggregator
Jobs_aggregator is an educational project that demonstrates web scraping using Selenium and BeautifulSoup modules to extract job data from the job portal Jobs.cz. The project also includes a web application built with the Django framework, with dynamic frontend elements implemented using JavaScript.
Project Goals
The primary goals of this project are:
- Learn Django framework by building the first web application
- Gain knowledge and experience in web scraping techniques
- Create a project that can be demonstrated during interviews
Functionality
The current features of Jobs_aggregator include:
- User authentication: Users must log in to access the scraping functionality.
- Customized authentication: Authentication function in Django is adjusted, and BruteBuster module is implemented to protect against brute force attacks.
- Job scraping: Users can select a job title and city to scrape data from Jobs.cz.
- Data storage: Scraped job data is stored in a SQLite3 database using Django models.
- Data rendering: The scraped data is rendered in an HTML table.
- Table filtering: Users can filter the table by text, salary (indicated/not indicated), and junior (junior included in the title).
- Table sorting: Users can sort the table by all headings, except URL, by clicking on the headers (switching between ascending and descending order).
Observations
Throughout the project, there were instances where the direction and scope of the project evolved. The initial idea was to build a web app in Django that could scrape jobs automatically and provide users with rendered data. However, the project took a different path, allowing users to perform the scraping themselves, which required additional adjustments.
For future projects, it is essential to clearly define the project goals, functionalities, and prioritize them accordingly to avoid ambiguity and ensure a smoother development process.
Installation and Usage
To use Jobs_aggregator locally, follow these steps:
- Clone the repository:
git clone https://github.com/your-username/Jobs_aggregator.git
- Install the required dependencies:
pip install -r requirements.txt
- Edit the
decorators.py
file in the BruteBuster module:- Locate the file at
brutebuster/decorators.py
- In line 45, modify the code snippet as follows:
if fa.recent_failure(): if fa.too_many_failures(): fa.failures += 1 fa.save() return False # MODIFY HERE
- Locate the file at
- Set up the database:
python manage.py migrate
- Start the development server:
python manage.py runserver
- Access the web application in your browser at
http://localhost:8000/dashboard
Contributing
Contributions to Jobs_aggregator are welcome! If you would like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Make your modifications.
- Commit and push your changes to your forked repository.
- Submit a pull request describing your changes.
License
This project is licensed under the MIT License.
Contact
If you have any questions or need further assistance, please feel free to contact the project owner.
Enjoy using Jobs_aggregator!