Transfer to Gitea
This commit is contained in:
parent
cfb2a04328
commit
e35a5f4ce2
|
@ -0,0 +1,21 @@
|
|||
MIT License
|
||||
|
||||
Copyright (c) [2023] [Jiri Karlik]
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
|
@ -0,0 +1,68 @@
|
|||
# Jobs_aggregator
|
||||
|
||||
Jobs_aggregator is an educational project that demonstrates web scraping using Selenium and BeautifulSoup modules to extract job data from the job portal Jobs.cz. The project also includes a web application built with the Django framework, with dynamic frontend elements implemented using JavaScript.
|
||||
|
||||
## Project Goals
|
||||
|
||||
The primary goals of this project are:
|
||||
|
||||
- Learn Django framework by building the first web application
|
||||
- Gain knowledge and experience in web scraping techniques
|
||||
- Create a project that can be demonstrated during interviews
|
||||
|
||||
## Functionality
|
||||
|
||||
The current features of Jobs_aggregator include:
|
||||
|
||||
- User authentication: Users must log in to access the scraping functionality.
|
||||
- Customized authentication: Authentication function in Django is adjusted, and BruteBuster module is implemented to protect against brute force attacks.
|
||||
- Job scraping: Users can select a job title and city to scrape data from Jobs.cz.
|
||||
- Data storage: Scraped job data is stored in a SQLite3 database using Django models.
|
||||
- Data rendering: The scraped data is rendered in an HTML table.
|
||||
- Table filtering: Users can filter the table by text, salary (indicated/not indicated), and junior (junior included in the title).
|
||||
- Table sorting: Users can sort the table by all headings, except URL, by clicking on the headers (switching between ascending and descending order).
|
||||
|
||||
## Observations
|
||||
|
||||
Throughout the project, there were instances where the direction and scope of the project evolved. The initial idea was to build a web app in Django that could scrape jobs automatically and provide users with rendered data. However, the project took a different path, allowing users to perform the scraping themselves, which required additional adjustments.
|
||||
|
||||
For future projects, it is essential to clearly define the project goals, functionalities, and prioritize them accordingly to avoid ambiguity and ensure a smoother development process.
|
||||
|
||||
## Installation and Usage
|
||||
|
||||
To use Jobs_aggregator locally, follow these steps:
|
||||
|
||||
1. Clone the repository: `git clone https://github.com/your-username/Jobs_aggregator.git`
|
||||
2. Install the required dependencies: `pip install -r requirements.txt`
|
||||
3. Edit the `decorators.py` file in the BruteBuster module:
|
||||
- Locate the file at `brutebuster/decorators.py`
|
||||
- In line 45, modify the code snippet as follows:
|
||||
```python
|
||||
if fa.recent_failure():
|
||||
if fa.too_many_failures():
|
||||
fa.failures += 1
|
||||
fa.save()
|
||||
return False # MODIFY HERE
|
||||
4. Set up the database: `python manage.py migrate`
|
||||
5. Start the development server: `python manage.py runserver`
|
||||
6. Access the web application in your browser at `http://localhost:8000/dashboard`
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions to Jobs_aggregator are welcome! If you would like to contribute, please follow these steps:
|
||||
|
||||
1. Fork the repository.
|
||||
2. Create a new branch for your feature or bug fix.
|
||||
3. Make your modifications.
|
||||
4. Commit and push your changes to your forked repository.
|
||||
5. Submit a pull request describing your changes.
|
||||
|
||||
## License
|
||||
|
||||
This project is licensed under the [MIT License](LICENSE).
|
||||
|
||||
## Contact
|
||||
|
||||
If you have any questions or need further assistance, please feel free to contact the project owner.
|
||||
|
||||
Enjoy using Jobs_aggregator!
|
|
@ -0,0 +1,16 @@
|
|||
"""
|
||||
ASGI config for jobs_aggregator project.
|
||||
|
||||
It exposes the ASGI callable as a module-level variable named ``application``.
|
||||
|
||||
For more information on this file, see
|
||||
https://docs.djangoproject.com/en/4.2/howto/deployment/asgi/
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
from django.core.asgi import get_asgi_application
|
||||
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jobs_aggregator.settings')
|
||||
|
||||
application = get_asgi_application()
|
|
@ -0,0 +1,127 @@
|
|||
"""
|
||||
Django settings for jobs_aggregator project.
|
||||
|
||||
Generated by 'django-admin startproject' using Django 4.2.1.
|
||||
|
||||
For more information on this file, see
|
||||
https://docs.djangoproject.com/en/4.2/topics/settings/
|
||||
|
||||
For the full list of settings and their values, see
|
||||
https://docs.djangoproject.com/en/4.2/ref/settings/
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
# Build paths inside the project like this: BASE_DIR / 'subdir'.
|
||||
BASE_DIR = Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
# Quick-start development settings - unsuitable for production
|
||||
# See https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/
|
||||
|
||||
# SECURITY WARNING: keep the secret key used in production secret!
|
||||
SECRET_KEY = 'django-insecure-fphcl#p=us6e_h!@ggvrwkv!8rd(s)$sv%2c9umb(8(0bu7y#m'
|
||||
|
||||
# SECURITY WARNING: don't run with debug turned on in production!
|
||||
DEBUG = True
|
||||
|
||||
ALLOWED_HOSTS = []
|
||||
|
||||
|
||||
# Application definition
|
||||
|
||||
INSTALLED_APPS = [
|
||||
'django.contrib.admin',
|
||||
'django.contrib.auth',
|
||||
'django.contrib.contenttypes',
|
||||
'django.contrib.sessions',
|
||||
'django.contrib.messages',
|
||||
'django.contrib.staticfiles',
|
||||
'jobs_dashboard',
|
||||
'BruteBuster'
|
||||
]
|
||||
|
||||
MIDDLEWARE = [
|
||||
'django.middleware.security.SecurityMiddleware',
|
||||
'django.contrib.sessions.middleware.SessionMiddleware',
|
||||
'django.middleware.common.CommonMiddleware',
|
||||
'django.middleware.csrf.CsrfViewMiddleware',
|
||||
'django.contrib.auth.middleware.AuthenticationMiddleware',
|
||||
'django.contrib.messages.middleware.MessageMiddleware',
|
||||
'django.middleware.clickjacking.XFrameOptionsMiddleware',
|
||||
'BruteBuster.middleware.RequestMiddleware'
|
||||
]
|
||||
|
||||
ROOT_URLCONF = 'jobs_aggregator.urls'
|
||||
|
||||
TEMPLATES = [
|
||||
{
|
||||
'BACKEND': 'django.template.backends.django.DjangoTemplates',
|
||||
'DIRS': [],
|
||||
'APP_DIRS': True,
|
||||
'OPTIONS': {
|
||||
'context_processors': [
|
||||
'django.template.context_processors.debug',
|
||||
'django.template.context_processors.request',
|
||||
'django.contrib.auth.context_processors.auth',
|
||||
'django.contrib.messages.context_processors.messages',
|
||||
],
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
WSGI_APPLICATION = 'jobs_aggregator.wsgi.application'
|
||||
|
||||
|
||||
# Database
|
||||
# https://docs.djangoproject.com/en/4.2/ref/settings/#databases
|
||||
|
||||
DATABASES = {
|
||||
'default': {
|
||||
'ENGINE': 'django.db.backends.sqlite3',
|
||||
'NAME': BASE_DIR / 'db.sqlite3',
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
# Password validation
|
||||
# https://docs.djangoproject.com/en/4.2/ref/settings/#auth-password-validators
|
||||
|
||||
AUTH_PASSWORD_VALIDATORS = [
|
||||
{
|
||||
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
|
||||
},
|
||||
{
|
||||
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
|
||||
},
|
||||
{
|
||||
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
|
||||
},
|
||||
{
|
||||
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
# Internationalization
|
||||
# https://docs.djangoproject.com/en/4.2/topics/i18n/
|
||||
|
||||
LANGUAGE_CODE = 'en-us'
|
||||
|
||||
TIME_ZONE = 'UTC'
|
||||
|
||||
USE_I18N = True
|
||||
|
||||
USE_TZ = True
|
||||
|
||||
|
||||
# Static files (CSS, JavaScript, Images)
|
||||
# https://docs.djangoproject.com/en/4.2/howto/static-files/
|
||||
|
||||
STATIC_URL = 'static/'
|
||||
|
||||
# Default primary key field type
|
||||
# https://docs.djangoproject.com/en/4.2/ref/settings/#default-auto-field
|
||||
|
||||
DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
|
||||
ALLOWED_HOSTS = ['192.168.0.26', 'localhost', '127.0.0.1']
|
|
@ -0,0 +1,23 @@
|
|||
"""
|
||||
URL configuration for jobs_aggregator project.
|
||||
|
||||
The `urlpatterns` list routes URLs to views. For more information please see:
|
||||
https://docs.djangoproject.com/en/4.2/topics/http/urls/
|
||||
Examples:
|
||||
Function views
|
||||
1. Add an import: from my_app import views
|
||||
2. Add a URL to urlpatterns: path('', views.home, name='home')
|
||||
Class-based views
|
||||
1. Add an import: from other_app.views import Home
|
||||
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
|
||||
Including another URLconf
|
||||
1. Import the include() function: from django.urls import include, path
|
||||
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
|
||||
"""
|
||||
from django.contrib import admin
|
||||
from django.urls import include, path
|
||||
|
||||
urlpatterns = [
|
||||
path('admin/', admin.site.urls),
|
||||
path('', include("jobs_dashboard.urls"))
|
||||
]
|
|
@ -0,0 +1,16 @@
|
|||
"""
|
||||
WSGI config for jobs_aggregator project.
|
||||
|
||||
It exposes the WSGI callable as a module-level variable named ``application``.
|
||||
|
||||
For more information on this file, see
|
||||
https://docs.djangoproject.com/en/4.2/howto/deployment/wsgi/
|
||||
"""
|
||||
|
||||
import os
|
||||
|
||||
from django.core.wsgi import get_wsgi_application
|
||||
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jobs_aggregator.settings')
|
||||
|
||||
application = get_wsgi_application()
|
|
@ -0,0 +1,3 @@
|
|||
from django.contrib import admin
|
||||
|
||||
# Register your models here.
|
|
@ -0,0 +1,6 @@
|
|||
from django.apps import AppConfig
|
||||
|
||||
|
||||
class JobsDashboardConfig(AppConfig):
|
||||
default_auto_field = 'django.db.models.BigAutoField'
|
||||
name = 'jobs_dashboard'
|
|
@ -0,0 +1,8 @@
|
|||
import os
|
||||
|
||||
from django.core.wsgi import get_wsgi_application
|
||||
|
||||
os.environ['DJANGO_SETTINGS_MODULE'] = 'jobs_aggregator.settings'
|
||||
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "jobs_aggregator.settings")
|
||||
|
||||
application = get_wsgi_application()
|
|
@ -0,0 +1,11 @@
|
|||
from django import forms
|
||||
from .models import JudgeDetail
|
||||
|
||||
class JudgesForm(forms.ModelForm):
|
||||
class Meta:
|
||||
model = JudgeDetail #Model only used to creat form fields atm
|
||||
fields = '__all__'
|
||||
widgets = {
|
||||
'password': forms.PasswordInput(),
|
||||
|
||||
}
|
|
@ -0,0 +1,125 @@
|
|||
from bs4 import BeautifulSoup
|
||||
from selenium import webdriver
|
||||
from datetime import date, timedelta
|
||||
from .support import parse_czech_date
|
||||
from jobs_dashboard.models import Job
|
||||
from selenium.webdriver.chrome.service import Service
|
||||
import logging
|
||||
|
||||
# Setup logging
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Chrome options for Selenium WebDriver
|
||||
chrome_options = webdriver.ChromeOptions()
|
||||
chrome_options.add_argument("--no-sandbox")
|
||||
chrome_options.add_argument("--headless")
|
||||
chrome_options.add_argument("--disable-gpu")
|
||||
|
||||
def save_page_source(page_source, page_num):
|
||||
"""
|
||||
Saves the page source to an HTML file.
|
||||
"""
|
||||
try:
|
||||
with open(f"page_{page_num}.html", "w", encoding="utf-8") as file:
|
||||
file.write(page_source)
|
||||
logger.info(f"Saved page {page_num} as page_{page_num}.html")
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to save page {page_num}: {e}")
|
||||
|
||||
def scrape_jobsCZ(URL):
|
||||
"""
|
||||
Scrapes data from all URL subpages.
|
||||
"""
|
||||
data_total = []
|
||||
page_num = 1
|
||||
|
||||
# Use context manager for WebDriver to ensure it closes properly
|
||||
with webdriver.Chrome(options=chrome_options) as driver:
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
|
||||
}
|
||||
driver.execute_cdp_cmd("Network.setUserAgentOverride", {"userAgent": headers["User-Agent"]})
|
||||
|
||||
try:
|
||||
while True:
|
||||
driver.get(URL + str(page_num))
|
||||
page = driver.page_source
|
||||
#save_page_source(page, page_num)
|
||||
soup = BeautifulSoup(page, "html.parser")
|
||||
data_temp = soup.find_all('article', {"class": "SearchResultCard"})
|
||||
|
||||
if soup.find_all('div', {"class": "Alert Alert--informative Alert--center mt-800 mb-600"}):
|
||||
break
|
||||
|
||||
data_total.extend(data_temp)
|
||||
page_num += 1
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"An error occurred while scraping data: {e}")
|
||||
|
||||
return data_total
|
||||
|
||||
def scrape_data(city, title, user):
|
||||
"""
|
||||
Splits data into variables and formats it.
|
||||
"""
|
||||
logger.info("Scrape called")
|
||||
clean_data(user)
|
||||
|
||||
try:
|
||||
url = f"https://www.jobs.cz/prace/{city}/?q%5B%5D={title}&locality[radius]=0&page="
|
||||
data = scrape_jobsCZ(url)
|
||||
today = date.today()
|
||||
|
||||
for item in data:
|
||||
# Extract and clean salary data
|
||||
salary_data = item.find("span", {"class": "Tag Tag--success Tag--small Tag--subtle"})
|
||||
salary_data = salary_data.string if salary_data else "N/A"
|
||||
|
||||
# Extract and format published date
|
||||
published = item.find("div", {"class": "SearchResultCard__status SearchResultCard__status--default"})
|
||||
if published:
|
||||
published = published.string
|
||||
if "včera" in published:
|
||||
published = (today - timedelta(days=1)).strftime("%d.%m.")
|
||||
elif "Přidáno" in published:
|
||||
published = today.strftime("%d.%m.")
|
||||
elif "Aktualizováno" in published:
|
||||
published = "Aktualizováno " + today.strftime("%d.%m.")
|
||||
else:
|
||||
published = parse_czech_date(published)
|
||||
else:
|
||||
published = item.find("div", {"class": "SearchResultCard__status SearchResultCard__status--danger"})
|
||||
published = published.string if published else "Unknown"
|
||||
|
||||
# Extract title, company, and link
|
||||
title = item.find("a", {"class": "link-primary SearchResultCard__titleLink"}).string
|
||||
company = item.find("li", {"class": "SearchResultCard__footerItem"}).find("span").string
|
||||
link = item.find("a", {"class": "link-primary SearchResultCard__titleLink"})["href"]
|
||||
unique_id = f"{title.lower().replace(' ', '')}_{company.lower().replace(' ', '')}"
|
||||
|
||||
# Save job to the database
|
||||
job = Job(
|
||||
title=title,
|
||||
company=company,
|
||||
link=link,
|
||||
salary=salary_data,
|
||||
published_date=published,
|
||||
unique_id=unique_id,
|
||||
user=user
|
||||
)
|
||||
job.save()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"An error occurred while processing data: {e}")
|
||||
|
||||
def clean_data(user):
|
||||
"""
|
||||
Cleans up existing job data for the user.
|
||||
"""
|
||||
try:
|
||||
Job.objects.filter(user=user).delete()
|
||||
except Exception as e:
|
||||
logger.info("No data to delete or an error occurred while deleting data.")
|
||||
logger.error(e)
|
|
@ -0,0 +1,35 @@
|
|||
# Generated by Django 4.2.1 on 2024-08-06 16:53
|
||||
|
||||
from django.db import migrations, models
|
||||
|
||||
|
||||
class Migration(migrations.Migration):
|
||||
|
||||
initial = True
|
||||
|
||||
dependencies = [
|
||||
]
|
||||
|
||||
operations = [
|
||||
migrations.CreateModel(
|
||||
name='Job',
|
||||
fields=[
|
||||
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
|
||||
('title', models.CharField(default='N/A', max_length=255)),
|
||||
('company', models.CharField(default='N/A', max_length=255)),
|
||||
('link', models.CharField(max_length=255, null=True)),
|
||||
('salary', models.CharField(max_length=255, null=True)),
|
||||
('published_date', models.CharField(max_length=255, null=True)),
|
||||
('unique_id', models.CharField(max_length=255, null=True)),
|
||||
('user', models.CharField(max_length=255, null=True)),
|
||||
],
|
||||
),
|
||||
migrations.CreateModel(
|
||||
name='JudgeDetail',
|
||||
fields=[
|
||||
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
|
||||
('username', models.CharField(max_length=50)),
|
||||
('password', models.CharField(max_length=50)),
|
||||
],
|
||||
),
|
||||
]
|
|
@ -0,0 +1,17 @@
|
|||
from django.db import models
|
||||
|
||||
# Create your models here.
|
||||
|
||||
class JudgeDetail(models.Model):
|
||||
username = models.CharField(max_length=50)
|
||||
password = models.CharField(max_length=50)
|
||||
|
||||
class Job(models.Model):
|
||||
title = models.CharField(max_length=255, default='N/A')
|
||||
company = models.CharField(max_length=255, default='N/A')
|
||||
link = models.CharField(max_length=255, null=True)
|
||||
salary = models.CharField(max_length=255, null=True)
|
||||
published_date = models.CharField(max_length=255, null=True)
|
||||
unique_id = models.CharField(max_length=255, null=True)
|
||||
user = models.CharField(max_length=255, null=True)
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
def parse_czech_date(input):
|
||||
months_cs = ["ledna","února","března","dubna","května","června","července","srpna","září","října","listopadu","prosince"]
|
||||
input = str(input)
|
||||
input = input.replace("\n","")
|
||||
input = input.replace(" ","")
|
||||
|
||||
for i , month in enumerate(months_cs):
|
||||
if month in input:
|
||||
return input.split(".")[0] + "." + str(i) + "."
|
||||
return "Date not Found"
|
|
@ -0,0 +1,48 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
{% load static %}
|
||||
<head>
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||
<link href="https://fonts.googleapis.com/css2?family=Barlow:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Jobs Aggregator Login</title>
|
||||
<link rel="stylesheet" href="{% static 'styles.css' %}" />
|
||||
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||
<script src="{% static 'support.js' %}"></script>
|
||||
</head>
|
||||
<body onpageshow="loadingCheck()">
|
||||
<div class="wrapper_main">
|
||||
<div class="wrapper">
|
||||
<div class="loader"></div>
|
||||
<div class="desc_wrapper">
|
||||
<h1>Welcome to Jobs Aggregator</h1>
|
||||
<p>This is educational project created by<br>
|
||||
<strong>Jiri Karlik</strong>.</p>
|
||||
<div class="social">
|
||||
<a href="https://github.com/karlji" target="_blank" class="fa fa-github" title="github"></a>
|
||||
<a href="https://www.linkedin.com/in/jiri-karlik/" target="_blank" class="fa fa-linkedin" title="linkedin"></a>
|
||||
<a href="mailto:karlikjirka@gmail.com" target="_blank" class="fa fa-envelope" title="email" ></a>
|
||||
</div>
|
||||
</div>
|
||||
<div class="login-wrapper">
|
||||
<form method="post" class="main_form">
|
||||
{% csrf_token %}
|
||||
{{ judge_form }}
|
||||
<button type="submit" value="Sign In" onclick="loading()"><span>Sign In</span></button>
|
||||
<!-- <input type="submit" value="Sign In" onclick="loading()"> -->
|
||||
</form>
|
||||
{% if messages %}
|
||||
<ul class="messages">
|
||||
{% for message in messages %}
|
||||
<span class="closebtn" onclick="this.parentElement.style.display='none';">{{ message }}</span>
|
||||
{% endfor %}
|
||||
</ul>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
|
||||
</html>
|
|
@ -0,0 +1,58 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
{% load static %}
|
||||
<head>
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||
<link href="https://fonts.googleapis.com/css2?family=Barlow:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Jobs Aggregator Scraper</title>
|
||||
<link rel="stylesheet" href="{% static 'styles.css' %}" />
|
||||
<script src="{% static 'support.js' %}"></script>
|
||||
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||
</head>
|
||||
<body>
|
||||
<div class="wrapper_main view">
|
||||
<div class="view_wrapper">
|
||||
<div class="loader"></div>
|
||||
<h1>Jobs Scraped</h1>
|
||||
<button onclick="window.location.href='./search';" class="go_back">Go Back</button>
|
||||
<form action="" method="post" class="view_form search">
|
||||
{% csrf_token %}
|
||||
<label for="my_input"><input type="text" name="my_input" placeholder="Title"></label>
|
||||
<label for="junior_check"><input type="checkbox" name="junior_check" id="junior_check"><span class="radio_span">Junior</span></label>
|
||||
<label for="salary_check"><input type="checkbox" name="salary_check" id="salary_check"><span class="radio_span">Salary</span></label>
|
||||
<button type="submit" class="filter">Filter</button>
|
||||
</form>
|
||||
{% if jobs_total|length > 0 %}
|
||||
<div class="table_wrapper">
|
||||
<table id="job_table" lang="cs">
|
||||
<tr lang="en" >
|
||||
<th onclick="sortTable(0)">Title</th>
|
||||
<th onclick="sortTable(1)">Salary</th>
|
||||
<th onclick="sortTable(2)">Company</th>
|
||||
<th onclick="sortTable(3)">Published</th>
|
||||
<th>URL</th>
|
||||
</tr>
|
||||
{% for job in jobs_total %}
|
||||
{% if input_text.lower in job.title.lower and junior_check in job.title.lower and salary_check in job.salary %}
|
||||
<tr>
|
||||
<td>{{ job.title }}</td>
|
||||
<td>{{ job.salary }}</td>
|
||||
<td>{{ job.company }}</td>
|
||||
<td>{{ job.published_date }}</td>
|
||||
<td><a href="{{ job.link }}" target="_blank">URL</a></td>
|
||||
</tr>
|
||||
{% endif %}
|
||||
{% endfor %}
|
||||
</table>
|
||||
</div>
|
||||
{% else %}
|
||||
<h2>No jobs found. Try it again.</h2>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
|
||||
</html>
|
|
@ -0,0 +1,35 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
{% load static %}
|
||||
<head>
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||
<link href="https://fonts.googleapis.com/css2?family=Barlow:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Jobs Aggregator Search</title>
|
||||
<link rel="stylesheet" href="{% static 'styles.css' %}" />
|
||||
<script src="{% static 'support.js' %}"></script>
|
||||
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||
</head>
|
||||
<body>
|
||||
<div class="wrapper_main">
|
||||
<div class="loader"></div>
|
||||
<div class="wrapper">
|
||||
<div class="login-wrapper">
|
||||
<h1>Jobs Aggregator</h1>
|
||||
<form action="" method="post" class="search main_form">
|
||||
{% csrf_token %}
|
||||
<label for="my_input"><input type="text" name="my_input" placeholder="Job Title" required></label>
|
||||
<label for="city_check"><input type="radio" id="city_check" name="city_check" value="plzen" checked='checked'><span class="radio_span">Plzeň</span></label>
|
||||
<label for="city_check2"><input type="radio" id="city_check2" name="city_check" value="praha"><span class="radio_span">Praha</span></label>
|
||||
<label for="city_check3"><input type="radio" id="city_check3" name="city_check" value="brno"><span class="radio_span">Brno</span></label>
|
||||
<!-- <input type="submit" value="Scrape Jobs.cz" onclick="loading()"> -->
|
||||
<button type="submit" onclick="loading()" ><span id="scrape_button">Scrape Jobs.cz</span></button>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</body>
|
||||
|
||||
</html>
|
|
@ -0,0 +1,3 @@
|
|||
from django.test import TestCase
|
||||
|
||||
# Create your tests here.
|
|
@ -0,0 +1,8 @@
|
|||
from django.urls import path
|
||||
from . import views
|
||||
|
||||
urlpatterns = [
|
||||
path("", views.home_view, name="home_view"),
|
||||
path("search", views.search, name="search"),
|
||||
path("view", views.index, name="index"),
|
||||
]
|
|
@ -0,0 +1,97 @@
|
|||
from django.shortcuts import render
|
||||
from django.http import HttpResponse
|
||||
from django.template import loader
|
||||
from .models import Job
|
||||
from .jobs_scraper import scrape_data
|
||||
from django.shortcuts import redirect
|
||||
from .forms import JudgesForm
|
||||
from django.contrib.auth import authenticate
|
||||
from django.contrib import messages
|
||||
from BruteBuster.models import FailedAttempt #BB needs edit to return False in decorators.py when calling fa.too_many_failures():
|
||||
from datetime import timedelta
|
||||
|
||||
# Create your views here.
|
||||
def home_view(request):
|
||||
# Initiate your form
|
||||
judge_form = JudgesForm(request.POST or None)
|
||||
# Initiate your session variable
|
||||
request.session['judge_password'] = 'invalid'
|
||||
if (request.method == 'POST'):
|
||||
if judge_form.is_valid():
|
||||
user = judge_form.cleaned_data['username']
|
||||
try:
|
||||
auth = authenticate(username=user, password=judge_form.cleaned_data['password'])# Authenticate is change with BruteBuster lib, which tracks failed login attempts into DB.
|
||||
if auth == False: # when max attempts failed, block user
|
||||
IP_ADDR = request.META.get('REMOTE_ADDR', None)
|
||||
fa = FailedAttempt.objects.filter(username=user, IP=IP_ADDR)[0]
|
||||
block_time = (fa.timestamp + timedelta(minutes=3)).strftime('%H:%M:%S')
|
||||
messages.info(request, u'%s BLOCKED until %s GMT' % (fa.username, block_time))
|
||||
if auth != None and auth != False:
|
||||
request.session['judge_password'] = 'valid'
|
||||
request.session['user'] = user
|
||||
return redirect('search')
|
||||
else:
|
||||
return redirect('home_view')
|
||||
except:
|
||||
print("Failed")
|
||||
# handle exceptions here """ """
|
||||
return render(request, "home.html", {'judge_form': judge_form})
|
||||
|
||||
def index(request):
|
||||
try:
|
||||
if (request.session['judge_password'] != 'valid'):
|
||||
return redirect('home_view')
|
||||
except:
|
||||
return redirect('home_view')
|
||||
jobs_total = Job.objects.filter(user=request.session['user']).values()
|
||||
template = loader.get_template('index.html')
|
||||
input_text = request.POST.get('my_input', None)
|
||||
junior_check = request.POST.get('junior_check', None)
|
||||
salary_check = request.POST.get('salary_check', None)
|
||||
city_check = request.POST.get('city_check', None)
|
||||
|
||||
if junior_check == "on":
|
||||
junior_check = "junior"
|
||||
else:
|
||||
junior_check = ""
|
||||
|
||||
if salary_check == "on":
|
||||
salary_check = "Kč"
|
||||
else:
|
||||
salary_check = ""
|
||||
|
||||
if input_text == None:
|
||||
input_text = ""
|
||||
|
||||
context = {
|
||||
'jobs_total': jobs_total,
|
||||
'input_text': input_text,
|
||||
'junior_check': junior_check,
|
||||
'salary_check': salary_check,
|
||||
'city_check': city_check
|
||||
|
||||
}
|
||||
return HttpResponse(template.render(context, request))
|
||||
|
||||
|
||||
def search(request):
|
||||
try:
|
||||
if (request.session['judge_password'] != 'valid'):
|
||||
return redirect('home_view')
|
||||
except:
|
||||
return redirect('home_view')
|
||||
|
||||
template = loader.get_template('search.html')
|
||||
input_text = request.POST.get('my_input', None)
|
||||
city_check = request.POST.get('city_check', None)
|
||||
context = {
|
||||
'city_check': city_check
|
||||
|
||||
}
|
||||
if city_check == None: #On first page load render search.html
|
||||
return HttpResponse(template.render(context, request))
|
||||
else:
|
||||
scrape_data(city_check,input_text,request.session['user'])
|
||||
#return redirect('/dashboard/view')
|
||||
return redirect('index')
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
#!/usr/bin/env python
|
||||
"""Django's command-line utility for administrative tasks."""
|
||||
import os
|
||||
import sys
|
||||
|
||||
|
||||
def main():
|
||||
"""Run administrative tasks."""
|
||||
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jobs_aggregator.settings')
|
||||
try:
|
||||
from django.core.management import execute_from_command_line
|
||||
except ImportError as exc:
|
||||
raise ImportError(
|
||||
"Couldn't import Django. Are you sure it's installed and "
|
||||
"available on your PYTHONPATH environment variable? Did you "
|
||||
"forget to activate a virtual environment?"
|
||||
) from exc
|
||||
execute_from_command_line(sys.argv)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
Binary file not shown.
Loading…
Reference in New Issue