Transfer to Gitea
This commit is contained in:
parent
cfb2a04328
commit
e35a5f4ce2
|
@ -0,0 +1,21 @@
|
||||||
|
MIT License
|
||||||
|
|
||||||
|
Copyright (c) [2023] [Jiri Karlik]
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
|
@ -0,0 +1,68 @@
|
||||||
|
# Jobs_aggregator
|
||||||
|
|
||||||
|
Jobs_aggregator is an educational project that demonstrates web scraping using Selenium and BeautifulSoup modules to extract job data from the job portal Jobs.cz. The project also includes a web application built with the Django framework, with dynamic frontend elements implemented using JavaScript.
|
||||||
|
|
||||||
|
## Project Goals
|
||||||
|
|
||||||
|
The primary goals of this project are:
|
||||||
|
|
||||||
|
- Learn Django framework by building the first web application
|
||||||
|
- Gain knowledge and experience in web scraping techniques
|
||||||
|
- Create a project that can be demonstrated during interviews
|
||||||
|
|
||||||
|
## Functionality
|
||||||
|
|
||||||
|
The current features of Jobs_aggregator include:
|
||||||
|
|
||||||
|
- User authentication: Users must log in to access the scraping functionality.
|
||||||
|
- Customized authentication: Authentication function in Django is adjusted, and BruteBuster module is implemented to protect against brute force attacks.
|
||||||
|
- Job scraping: Users can select a job title and city to scrape data from Jobs.cz.
|
||||||
|
- Data storage: Scraped job data is stored in a SQLite3 database using Django models.
|
||||||
|
- Data rendering: The scraped data is rendered in an HTML table.
|
||||||
|
- Table filtering: Users can filter the table by text, salary (indicated/not indicated), and junior (junior included in the title).
|
||||||
|
- Table sorting: Users can sort the table by all headings, except URL, by clicking on the headers (switching between ascending and descending order).
|
||||||
|
|
||||||
|
## Observations
|
||||||
|
|
||||||
|
Throughout the project, there were instances where the direction and scope of the project evolved. The initial idea was to build a web app in Django that could scrape jobs automatically and provide users with rendered data. However, the project took a different path, allowing users to perform the scraping themselves, which required additional adjustments.
|
||||||
|
|
||||||
|
For future projects, it is essential to clearly define the project goals, functionalities, and prioritize them accordingly to avoid ambiguity and ensure a smoother development process.
|
||||||
|
|
||||||
|
## Installation and Usage
|
||||||
|
|
||||||
|
To use Jobs_aggregator locally, follow these steps:
|
||||||
|
|
||||||
|
1. Clone the repository: `git clone https://github.com/your-username/Jobs_aggregator.git`
|
||||||
|
2. Install the required dependencies: `pip install -r requirements.txt`
|
||||||
|
3. Edit the `decorators.py` file in the BruteBuster module:
|
||||||
|
- Locate the file at `brutebuster/decorators.py`
|
||||||
|
- In line 45, modify the code snippet as follows:
|
||||||
|
```python
|
||||||
|
if fa.recent_failure():
|
||||||
|
if fa.too_many_failures():
|
||||||
|
fa.failures += 1
|
||||||
|
fa.save()
|
||||||
|
return False # MODIFY HERE
|
||||||
|
4. Set up the database: `python manage.py migrate`
|
||||||
|
5. Start the development server: `python manage.py runserver`
|
||||||
|
6. Access the web application in your browser at `http://localhost:8000/dashboard`
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions to Jobs_aggregator are welcome! If you would like to contribute, please follow these steps:
|
||||||
|
|
||||||
|
1. Fork the repository.
|
||||||
|
2. Create a new branch for your feature or bug fix.
|
||||||
|
3. Make your modifications.
|
||||||
|
4. Commit and push your changes to your forked repository.
|
||||||
|
5. Submit a pull request describing your changes.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is licensed under the [MIT License](LICENSE).
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
|
||||||
|
If you have any questions or need further assistance, please feel free to contact the project owner.
|
||||||
|
|
||||||
|
Enjoy using Jobs_aggregator!
|
|
@ -0,0 +1,16 @@
|
||||||
|
"""
|
||||||
|
ASGI config for jobs_aggregator project.
|
||||||
|
|
||||||
|
It exposes the ASGI callable as a module-level variable named ``application``.
|
||||||
|
|
||||||
|
For more information on this file, see
|
||||||
|
https://docs.djangoproject.com/en/4.2/howto/deployment/asgi/
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from django.core.asgi import get_asgi_application
|
||||||
|
|
||||||
|
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jobs_aggregator.settings')
|
||||||
|
|
||||||
|
application = get_asgi_application()
|
|
@ -0,0 +1,127 @@
|
||||||
|
"""
|
||||||
|
Django settings for jobs_aggregator project.
|
||||||
|
|
||||||
|
Generated by 'django-admin startproject' using Django 4.2.1.
|
||||||
|
|
||||||
|
For more information on this file, see
|
||||||
|
https://docs.djangoproject.com/en/4.2/topics/settings/
|
||||||
|
|
||||||
|
For the full list of settings and their values, see
|
||||||
|
https://docs.djangoproject.com/en/4.2/ref/settings/
|
||||||
|
"""
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Build paths inside the project like this: BASE_DIR / 'subdir'.
|
||||||
|
BASE_DIR = Path(__file__).resolve().parent.parent
|
||||||
|
|
||||||
|
|
||||||
|
# Quick-start development settings - unsuitable for production
|
||||||
|
# See https://docs.djangoproject.com/en/4.2/howto/deployment/checklist/
|
||||||
|
|
||||||
|
# SECURITY WARNING: keep the secret key used in production secret!
|
||||||
|
SECRET_KEY = 'django-insecure-fphcl#p=us6e_h!@ggvrwkv!8rd(s)$sv%2c9umb(8(0bu7y#m'
|
||||||
|
|
||||||
|
# SECURITY WARNING: don't run with debug turned on in production!
|
||||||
|
DEBUG = True
|
||||||
|
|
||||||
|
ALLOWED_HOSTS = []
|
||||||
|
|
||||||
|
|
||||||
|
# Application definition
|
||||||
|
|
||||||
|
INSTALLED_APPS = [
|
||||||
|
'django.contrib.admin',
|
||||||
|
'django.contrib.auth',
|
||||||
|
'django.contrib.contenttypes',
|
||||||
|
'django.contrib.sessions',
|
||||||
|
'django.contrib.messages',
|
||||||
|
'django.contrib.staticfiles',
|
||||||
|
'jobs_dashboard',
|
||||||
|
'BruteBuster'
|
||||||
|
]
|
||||||
|
|
||||||
|
MIDDLEWARE = [
|
||||||
|
'django.middleware.security.SecurityMiddleware',
|
||||||
|
'django.contrib.sessions.middleware.SessionMiddleware',
|
||||||
|
'django.middleware.common.CommonMiddleware',
|
||||||
|
'django.middleware.csrf.CsrfViewMiddleware',
|
||||||
|
'django.contrib.auth.middleware.AuthenticationMiddleware',
|
||||||
|
'django.contrib.messages.middleware.MessageMiddleware',
|
||||||
|
'django.middleware.clickjacking.XFrameOptionsMiddleware',
|
||||||
|
'BruteBuster.middleware.RequestMiddleware'
|
||||||
|
]
|
||||||
|
|
||||||
|
ROOT_URLCONF = 'jobs_aggregator.urls'
|
||||||
|
|
||||||
|
TEMPLATES = [
|
||||||
|
{
|
||||||
|
'BACKEND': 'django.template.backends.django.DjangoTemplates',
|
||||||
|
'DIRS': [],
|
||||||
|
'APP_DIRS': True,
|
||||||
|
'OPTIONS': {
|
||||||
|
'context_processors': [
|
||||||
|
'django.template.context_processors.debug',
|
||||||
|
'django.template.context_processors.request',
|
||||||
|
'django.contrib.auth.context_processors.auth',
|
||||||
|
'django.contrib.messages.context_processors.messages',
|
||||||
|
],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
WSGI_APPLICATION = 'jobs_aggregator.wsgi.application'
|
||||||
|
|
||||||
|
|
||||||
|
# Database
|
||||||
|
# https://docs.djangoproject.com/en/4.2/ref/settings/#databases
|
||||||
|
|
||||||
|
DATABASES = {
|
||||||
|
'default': {
|
||||||
|
'ENGINE': 'django.db.backends.sqlite3',
|
||||||
|
'NAME': BASE_DIR / 'db.sqlite3',
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# Password validation
|
||||||
|
# https://docs.djangoproject.com/en/4.2/ref/settings/#auth-password-validators
|
||||||
|
|
||||||
|
AUTH_PASSWORD_VALIDATORS = [
|
||||||
|
{
|
||||||
|
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# Internationalization
|
||||||
|
# https://docs.djangoproject.com/en/4.2/topics/i18n/
|
||||||
|
|
||||||
|
LANGUAGE_CODE = 'en-us'
|
||||||
|
|
||||||
|
TIME_ZONE = 'UTC'
|
||||||
|
|
||||||
|
USE_I18N = True
|
||||||
|
|
||||||
|
USE_TZ = True
|
||||||
|
|
||||||
|
|
||||||
|
# Static files (CSS, JavaScript, Images)
|
||||||
|
# https://docs.djangoproject.com/en/4.2/howto/static-files/
|
||||||
|
|
||||||
|
STATIC_URL = 'static/'
|
||||||
|
|
||||||
|
# Default primary key field type
|
||||||
|
# https://docs.djangoproject.com/en/4.2/ref/settings/#default-auto-field
|
||||||
|
|
||||||
|
DEFAULT_AUTO_FIELD = 'django.db.models.BigAutoField'
|
||||||
|
ALLOWED_HOSTS = ['192.168.0.26', 'localhost', '127.0.0.1']
|
|
@ -0,0 +1,23 @@
|
||||||
|
"""
|
||||||
|
URL configuration for jobs_aggregator project.
|
||||||
|
|
||||||
|
The `urlpatterns` list routes URLs to views. For more information please see:
|
||||||
|
https://docs.djangoproject.com/en/4.2/topics/http/urls/
|
||||||
|
Examples:
|
||||||
|
Function views
|
||||||
|
1. Add an import: from my_app import views
|
||||||
|
2. Add a URL to urlpatterns: path('', views.home, name='home')
|
||||||
|
Class-based views
|
||||||
|
1. Add an import: from other_app.views import Home
|
||||||
|
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
|
||||||
|
Including another URLconf
|
||||||
|
1. Import the include() function: from django.urls import include, path
|
||||||
|
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
|
||||||
|
"""
|
||||||
|
from django.contrib import admin
|
||||||
|
from django.urls import include, path
|
||||||
|
|
||||||
|
urlpatterns = [
|
||||||
|
path('admin/', admin.site.urls),
|
||||||
|
path('', include("jobs_dashboard.urls"))
|
||||||
|
]
|
|
@ -0,0 +1,16 @@
|
||||||
|
"""
|
||||||
|
WSGI config for jobs_aggregator project.
|
||||||
|
|
||||||
|
It exposes the WSGI callable as a module-level variable named ``application``.
|
||||||
|
|
||||||
|
For more information on this file, see
|
||||||
|
https://docs.djangoproject.com/en/4.2/howto/deployment/wsgi/
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
|
||||||
|
from django.core.wsgi import get_wsgi_application
|
||||||
|
|
||||||
|
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jobs_aggregator.settings')
|
||||||
|
|
||||||
|
application = get_wsgi_application()
|
|
@ -0,0 +1,3 @@
|
||||||
|
from django.contrib import admin
|
||||||
|
|
||||||
|
# Register your models here.
|
|
@ -0,0 +1,6 @@
|
||||||
|
from django.apps import AppConfig
|
||||||
|
|
||||||
|
|
||||||
|
class JobsDashboardConfig(AppConfig):
|
||||||
|
default_auto_field = 'django.db.models.BigAutoField'
|
||||||
|
name = 'jobs_dashboard'
|
|
@ -0,0 +1,8 @@
|
||||||
|
import os
|
||||||
|
|
||||||
|
from django.core.wsgi import get_wsgi_application
|
||||||
|
|
||||||
|
os.environ['DJANGO_SETTINGS_MODULE'] = 'jobs_aggregator.settings'
|
||||||
|
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "jobs_aggregator.settings")
|
||||||
|
|
||||||
|
application = get_wsgi_application()
|
|
@ -0,0 +1,11 @@
|
||||||
|
from django import forms
|
||||||
|
from .models import JudgeDetail
|
||||||
|
|
||||||
|
class JudgesForm(forms.ModelForm):
|
||||||
|
class Meta:
|
||||||
|
model = JudgeDetail #Model only used to creat form fields atm
|
||||||
|
fields = '__all__'
|
||||||
|
widgets = {
|
||||||
|
'password': forms.PasswordInput(),
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,125 @@
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
from selenium import webdriver
|
||||||
|
from datetime import date, timedelta
|
||||||
|
from .support import parse_czech_date
|
||||||
|
from jobs_dashboard.models import Job
|
||||||
|
from selenium.webdriver.chrome.service import Service
|
||||||
|
import logging
|
||||||
|
|
||||||
|
# Setup logging
|
||||||
|
logging.basicConfig(level=logging.INFO)
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Chrome options for Selenium WebDriver
|
||||||
|
chrome_options = webdriver.ChromeOptions()
|
||||||
|
chrome_options.add_argument("--no-sandbox")
|
||||||
|
chrome_options.add_argument("--headless")
|
||||||
|
chrome_options.add_argument("--disable-gpu")
|
||||||
|
|
||||||
|
def save_page_source(page_source, page_num):
|
||||||
|
"""
|
||||||
|
Saves the page source to an HTML file.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
with open(f"page_{page_num}.html", "w", encoding="utf-8") as file:
|
||||||
|
file.write(page_source)
|
||||||
|
logger.info(f"Saved page {page_num} as page_{page_num}.html")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Failed to save page {page_num}: {e}")
|
||||||
|
|
||||||
|
def scrape_jobsCZ(URL):
|
||||||
|
"""
|
||||||
|
Scrapes data from all URL subpages.
|
||||||
|
"""
|
||||||
|
data_total = []
|
||||||
|
page_num = 1
|
||||||
|
|
||||||
|
# Use context manager for WebDriver to ensure it closes properly
|
||||||
|
with webdriver.Chrome(options=chrome_options) as driver:
|
||||||
|
headers = {
|
||||||
|
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
|
||||||
|
}
|
||||||
|
driver.execute_cdp_cmd("Network.setUserAgentOverride", {"userAgent": headers["User-Agent"]})
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
driver.get(URL + str(page_num))
|
||||||
|
page = driver.page_source
|
||||||
|
#save_page_source(page, page_num)
|
||||||
|
soup = BeautifulSoup(page, "html.parser")
|
||||||
|
data_temp = soup.find_all('article', {"class": "SearchResultCard"})
|
||||||
|
|
||||||
|
if soup.find_all('div', {"class": "Alert Alert--informative Alert--center mt-800 mb-600"}):
|
||||||
|
break
|
||||||
|
|
||||||
|
data_total.extend(data_temp)
|
||||||
|
page_num += 1
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"An error occurred while scraping data: {e}")
|
||||||
|
|
||||||
|
return data_total
|
||||||
|
|
||||||
|
def scrape_data(city, title, user):
|
||||||
|
"""
|
||||||
|
Splits data into variables and formats it.
|
||||||
|
"""
|
||||||
|
logger.info("Scrape called")
|
||||||
|
clean_data(user)
|
||||||
|
|
||||||
|
try:
|
||||||
|
url = f"https://www.jobs.cz/prace/{city}/?q%5B%5D={title}&locality[radius]=0&page="
|
||||||
|
data = scrape_jobsCZ(url)
|
||||||
|
today = date.today()
|
||||||
|
|
||||||
|
for item in data:
|
||||||
|
# Extract and clean salary data
|
||||||
|
salary_data = item.find("span", {"class": "Tag Tag--success Tag--small Tag--subtle"})
|
||||||
|
salary_data = salary_data.string if salary_data else "N/A"
|
||||||
|
|
||||||
|
# Extract and format published date
|
||||||
|
published = item.find("div", {"class": "SearchResultCard__status SearchResultCard__status--default"})
|
||||||
|
if published:
|
||||||
|
published = published.string
|
||||||
|
if "včera" in published:
|
||||||
|
published = (today - timedelta(days=1)).strftime("%d.%m.")
|
||||||
|
elif "Přidáno" in published:
|
||||||
|
published = today.strftime("%d.%m.")
|
||||||
|
elif "Aktualizováno" in published:
|
||||||
|
published = "Aktualizováno " + today.strftime("%d.%m.")
|
||||||
|
else:
|
||||||
|
published = parse_czech_date(published)
|
||||||
|
else:
|
||||||
|
published = item.find("div", {"class": "SearchResultCard__status SearchResultCard__status--danger"})
|
||||||
|
published = published.string if published else "Unknown"
|
||||||
|
|
||||||
|
# Extract title, company, and link
|
||||||
|
title = item.find("a", {"class": "link-primary SearchResultCard__titleLink"}).string
|
||||||
|
company = item.find("li", {"class": "SearchResultCard__footerItem"}).find("span").string
|
||||||
|
link = item.find("a", {"class": "link-primary SearchResultCard__titleLink"})["href"]
|
||||||
|
unique_id = f"{title.lower().replace(' ', '')}_{company.lower().replace(' ', '')}"
|
||||||
|
|
||||||
|
# Save job to the database
|
||||||
|
job = Job(
|
||||||
|
title=title,
|
||||||
|
company=company,
|
||||||
|
link=link,
|
||||||
|
salary=salary_data,
|
||||||
|
published_date=published,
|
||||||
|
unique_id=unique_id,
|
||||||
|
user=user
|
||||||
|
)
|
||||||
|
job.save()
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"An error occurred while processing data: {e}")
|
||||||
|
|
||||||
|
def clean_data(user):
|
||||||
|
"""
|
||||||
|
Cleans up existing job data for the user.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
Job.objects.filter(user=user).delete()
|
||||||
|
except Exception as e:
|
||||||
|
logger.info("No data to delete or an error occurred while deleting data.")
|
||||||
|
logger.error(e)
|
|
@ -0,0 +1,35 @@
|
||||||
|
# Generated by Django 4.2.1 on 2024-08-06 16:53
|
||||||
|
|
||||||
|
from django.db import migrations, models
|
||||||
|
|
||||||
|
|
||||||
|
class Migration(migrations.Migration):
|
||||||
|
|
||||||
|
initial = True
|
||||||
|
|
||||||
|
dependencies = [
|
||||||
|
]
|
||||||
|
|
||||||
|
operations = [
|
||||||
|
migrations.CreateModel(
|
||||||
|
name='Job',
|
||||||
|
fields=[
|
||||||
|
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
|
||||||
|
('title', models.CharField(default='N/A', max_length=255)),
|
||||||
|
('company', models.CharField(default='N/A', max_length=255)),
|
||||||
|
('link', models.CharField(max_length=255, null=True)),
|
||||||
|
('salary', models.CharField(max_length=255, null=True)),
|
||||||
|
('published_date', models.CharField(max_length=255, null=True)),
|
||||||
|
('unique_id', models.CharField(max_length=255, null=True)),
|
||||||
|
('user', models.CharField(max_length=255, null=True)),
|
||||||
|
],
|
||||||
|
),
|
||||||
|
migrations.CreateModel(
|
||||||
|
name='JudgeDetail',
|
||||||
|
fields=[
|
||||||
|
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
|
||||||
|
('username', models.CharField(max_length=50)),
|
||||||
|
('password', models.CharField(max_length=50)),
|
||||||
|
],
|
||||||
|
),
|
||||||
|
]
|
|
@ -0,0 +1,17 @@
|
||||||
|
from django.db import models
|
||||||
|
|
||||||
|
# Create your models here.
|
||||||
|
|
||||||
|
class JudgeDetail(models.Model):
|
||||||
|
username = models.CharField(max_length=50)
|
||||||
|
password = models.CharField(max_length=50)
|
||||||
|
|
||||||
|
class Job(models.Model):
|
||||||
|
title = models.CharField(max_length=255, default='N/A')
|
||||||
|
company = models.CharField(max_length=255, default='N/A')
|
||||||
|
link = models.CharField(max_length=255, null=True)
|
||||||
|
salary = models.CharField(max_length=255, null=True)
|
||||||
|
published_date = models.CharField(max_length=255, null=True)
|
||||||
|
unique_id = models.CharField(max_length=255, null=True)
|
||||||
|
user = models.CharField(max_length=255, null=True)
|
||||||
|
|
|
@ -0,0 +1,10 @@
|
||||||
|
def parse_czech_date(input):
|
||||||
|
months_cs = ["ledna","února","března","dubna","května","června","července","srpna","září","října","listopadu","prosince"]
|
||||||
|
input = str(input)
|
||||||
|
input = input.replace("\n","")
|
||||||
|
input = input.replace(" ","")
|
||||||
|
|
||||||
|
for i , month in enumerate(months_cs):
|
||||||
|
if month in input:
|
||||||
|
return input.split(".")[0] + "." + str(i) + "."
|
||||||
|
return "Date not Found"
|
|
@ -0,0 +1,48 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
{% load static %}
|
||||||
|
<head>
|
||||||
|
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||||
|
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||||
|
<link href="https://fonts.googleapis.com/css2?family=Barlow:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>Jobs Aggregator Login</title>
|
||||||
|
<link rel="stylesheet" href="{% static 'styles.css' %}" />
|
||||||
|
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||||
|
<script src="{% static 'support.js' %}"></script>
|
||||||
|
</head>
|
||||||
|
<body onpageshow="loadingCheck()">
|
||||||
|
<div class="wrapper_main">
|
||||||
|
<div class="wrapper">
|
||||||
|
<div class="loader"></div>
|
||||||
|
<div class="desc_wrapper">
|
||||||
|
<h1>Welcome to Jobs Aggregator</h1>
|
||||||
|
<p>This is educational project created by<br>
|
||||||
|
<strong>Jiri Karlik</strong>.</p>
|
||||||
|
<div class="social">
|
||||||
|
<a href="https://github.com/karlji" target="_blank" class="fa fa-github" title="github"></a>
|
||||||
|
<a href="https://www.linkedin.com/in/jiri-karlik/" target="_blank" class="fa fa-linkedin" title="linkedin"></a>
|
||||||
|
<a href="mailto:karlikjirka@gmail.com" target="_blank" class="fa fa-envelope" title="email" ></a>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="login-wrapper">
|
||||||
|
<form method="post" class="main_form">
|
||||||
|
{% csrf_token %}
|
||||||
|
{{ judge_form }}
|
||||||
|
<button type="submit" value="Sign In" onclick="loading()"><span>Sign In</span></button>
|
||||||
|
<!-- <input type="submit" value="Sign In" onclick="loading()"> -->
|
||||||
|
</form>
|
||||||
|
{% if messages %}
|
||||||
|
<ul class="messages">
|
||||||
|
{% for message in messages %}
|
||||||
|
<span class="closebtn" onclick="this.parentElement.style.display='none';">{{ message }}</span>
|
||||||
|
{% endfor %}
|
||||||
|
</ul>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
|
||||||
|
</html>
|
|
@ -0,0 +1,58 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
{% load static %}
|
||||||
|
<head>
|
||||||
|
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||||
|
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||||
|
<link href="https://fonts.googleapis.com/css2?family=Barlow:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>Jobs Aggregator Scraper</title>
|
||||||
|
<link rel="stylesheet" href="{% static 'styles.css' %}" />
|
||||||
|
<script src="{% static 'support.js' %}"></script>
|
||||||
|
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="wrapper_main view">
|
||||||
|
<div class="view_wrapper">
|
||||||
|
<div class="loader"></div>
|
||||||
|
<h1>Jobs Scraped</h1>
|
||||||
|
<button onclick="window.location.href='./search';" class="go_back">Go Back</button>
|
||||||
|
<form action="" method="post" class="view_form search">
|
||||||
|
{% csrf_token %}
|
||||||
|
<label for="my_input"><input type="text" name="my_input" placeholder="Title"></label>
|
||||||
|
<label for="junior_check"><input type="checkbox" name="junior_check" id="junior_check"><span class="radio_span">Junior</span></label>
|
||||||
|
<label for="salary_check"><input type="checkbox" name="salary_check" id="salary_check"><span class="radio_span">Salary</span></label>
|
||||||
|
<button type="submit" class="filter">Filter</button>
|
||||||
|
</form>
|
||||||
|
{% if jobs_total|length > 0 %}
|
||||||
|
<div class="table_wrapper">
|
||||||
|
<table id="job_table" lang="cs">
|
||||||
|
<tr lang="en" >
|
||||||
|
<th onclick="sortTable(0)">Title</th>
|
||||||
|
<th onclick="sortTable(1)">Salary</th>
|
||||||
|
<th onclick="sortTable(2)">Company</th>
|
||||||
|
<th onclick="sortTable(3)">Published</th>
|
||||||
|
<th>URL</th>
|
||||||
|
</tr>
|
||||||
|
{% for job in jobs_total %}
|
||||||
|
{% if input_text.lower in job.title.lower and junior_check in job.title.lower and salary_check in job.salary %}
|
||||||
|
<tr>
|
||||||
|
<td>{{ job.title }}</td>
|
||||||
|
<td>{{ job.salary }}</td>
|
||||||
|
<td>{{ job.company }}</td>
|
||||||
|
<td>{{ job.published_date }}</td>
|
||||||
|
<td><a href="{{ job.link }}" target="_blank">URL</a></td>
|
||||||
|
</tr>
|
||||||
|
{% endif %}
|
||||||
|
{% endfor %}
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
{% else %}
|
||||||
|
<h2>No jobs found. Try it again.</h2>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
|
||||||
|
</html>
|
|
@ -0,0 +1,35 @@
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
{% load static %}
|
||||||
|
<head>
|
||||||
|
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||||
|
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||||
|
<link href="https://fonts.googleapis.com/css2?family=Barlow:wght@300;400;500;600;700&display=swap" rel="stylesheet">
|
||||||
|
<meta charset="UTF-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||||
|
<title>Jobs Aggregator Search</title>
|
||||||
|
<link rel="stylesheet" href="{% static 'styles.css' %}" />
|
||||||
|
<script src="{% static 'support.js' %}"></script>
|
||||||
|
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="wrapper_main">
|
||||||
|
<div class="loader"></div>
|
||||||
|
<div class="wrapper">
|
||||||
|
<div class="login-wrapper">
|
||||||
|
<h1>Jobs Aggregator</h1>
|
||||||
|
<form action="" method="post" class="search main_form">
|
||||||
|
{% csrf_token %}
|
||||||
|
<label for="my_input"><input type="text" name="my_input" placeholder="Job Title" required></label>
|
||||||
|
<label for="city_check"><input type="radio" id="city_check" name="city_check" value="plzen" checked='checked'><span class="radio_span">Plzeň</span></label>
|
||||||
|
<label for="city_check2"><input type="radio" id="city_check2" name="city_check" value="praha"><span class="radio_span">Praha</span></label>
|
||||||
|
<label for="city_check3"><input type="radio" id="city_check3" name="city_check" value="brno"><span class="radio_span">Brno</span></label>
|
||||||
|
<!-- <input type="submit" value="Scrape Jobs.cz" onclick="loading()"> -->
|
||||||
|
<button type="submit" onclick="loading()" ><span id="scrape_button">Scrape Jobs.cz</span></button>
|
||||||
|
</form>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
|
||||||
|
</html>
|
|
@ -0,0 +1,3 @@
|
||||||
|
from django.test import TestCase
|
||||||
|
|
||||||
|
# Create your tests here.
|
|
@ -0,0 +1,8 @@
|
||||||
|
from django.urls import path
|
||||||
|
from . import views
|
||||||
|
|
||||||
|
urlpatterns = [
|
||||||
|
path("", views.home_view, name="home_view"),
|
||||||
|
path("search", views.search, name="search"),
|
||||||
|
path("view", views.index, name="index"),
|
||||||
|
]
|
|
@ -0,0 +1,97 @@
|
||||||
|
from django.shortcuts import render
|
||||||
|
from django.http import HttpResponse
|
||||||
|
from django.template import loader
|
||||||
|
from .models import Job
|
||||||
|
from .jobs_scraper import scrape_data
|
||||||
|
from django.shortcuts import redirect
|
||||||
|
from .forms import JudgesForm
|
||||||
|
from django.contrib.auth import authenticate
|
||||||
|
from django.contrib import messages
|
||||||
|
from BruteBuster.models import FailedAttempt #BB needs edit to return False in decorators.py when calling fa.too_many_failures():
|
||||||
|
from datetime import timedelta
|
||||||
|
|
||||||
|
# Create your views here.
|
||||||
|
def home_view(request):
|
||||||
|
# Initiate your form
|
||||||
|
judge_form = JudgesForm(request.POST or None)
|
||||||
|
# Initiate your session variable
|
||||||
|
request.session['judge_password'] = 'invalid'
|
||||||
|
if (request.method == 'POST'):
|
||||||
|
if judge_form.is_valid():
|
||||||
|
user = judge_form.cleaned_data['username']
|
||||||
|
try:
|
||||||
|
auth = authenticate(username=user, password=judge_form.cleaned_data['password'])# Authenticate is change with BruteBuster lib, which tracks failed login attempts into DB.
|
||||||
|
if auth == False: # when max attempts failed, block user
|
||||||
|
IP_ADDR = request.META.get('REMOTE_ADDR', None)
|
||||||
|
fa = FailedAttempt.objects.filter(username=user, IP=IP_ADDR)[0]
|
||||||
|
block_time = (fa.timestamp + timedelta(minutes=3)).strftime('%H:%M:%S')
|
||||||
|
messages.info(request, u'%s BLOCKED until %s GMT' % (fa.username, block_time))
|
||||||
|
if auth != None and auth != False:
|
||||||
|
request.session['judge_password'] = 'valid'
|
||||||
|
request.session['user'] = user
|
||||||
|
return redirect('search')
|
||||||
|
else:
|
||||||
|
return redirect('home_view')
|
||||||
|
except:
|
||||||
|
print("Failed")
|
||||||
|
# handle exceptions here """ """
|
||||||
|
return render(request, "home.html", {'judge_form': judge_form})
|
||||||
|
|
||||||
|
def index(request):
|
||||||
|
try:
|
||||||
|
if (request.session['judge_password'] != 'valid'):
|
||||||
|
return redirect('home_view')
|
||||||
|
except:
|
||||||
|
return redirect('home_view')
|
||||||
|
jobs_total = Job.objects.filter(user=request.session['user']).values()
|
||||||
|
template = loader.get_template('index.html')
|
||||||
|
input_text = request.POST.get('my_input', None)
|
||||||
|
junior_check = request.POST.get('junior_check', None)
|
||||||
|
salary_check = request.POST.get('salary_check', None)
|
||||||
|
city_check = request.POST.get('city_check', None)
|
||||||
|
|
||||||
|
if junior_check == "on":
|
||||||
|
junior_check = "junior"
|
||||||
|
else:
|
||||||
|
junior_check = ""
|
||||||
|
|
||||||
|
if salary_check == "on":
|
||||||
|
salary_check = "Kč"
|
||||||
|
else:
|
||||||
|
salary_check = ""
|
||||||
|
|
||||||
|
if input_text == None:
|
||||||
|
input_text = ""
|
||||||
|
|
||||||
|
context = {
|
||||||
|
'jobs_total': jobs_total,
|
||||||
|
'input_text': input_text,
|
||||||
|
'junior_check': junior_check,
|
||||||
|
'salary_check': salary_check,
|
||||||
|
'city_check': city_check
|
||||||
|
|
||||||
|
}
|
||||||
|
return HttpResponse(template.render(context, request))
|
||||||
|
|
||||||
|
|
||||||
|
def search(request):
|
||||||
|
try:
|
||||||
|
if (request.session['judge_password'] != 'valid'):
|
||||||
|
return redirect('home_view')
|
||||||
|
except:
|
||||||
|
return redirect('home_view')
|
||||||
|
|
||||||
|
template = loader.get_template('search.html')
|
||||||
|
input_text = request.POST.get('my_input', None)
|
||||||
|
city_check = request.POST.get('city_check', None)
|
||||||
|
context = {
|
||||||
|
'city_check': city_check
|
||||||
|
|
||||||
|
}
|
||||||
|
if city_check == None: #On first page load render search.html
|
||||||
|
return HttpResponse(template.render(context, request))
|
||||||
|
else:
|
||||||
|
scrape_data(city_check,input_text,request.session['user'])
|
||||||
|
#return redirect('/dashboard/view')
|
||||||
|
return redirect('index')
|
||||||
|
|
|
@ -0,0 +1,22 @@
|
||||||
|
#!/usr/bin/env python
|
||||||
|
"""Django's command-line utility for administrative tasks."""
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Run administrative tasks."""
|
||||||
|
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'jobs_aggregator.settings')
|
||||||
|
try:
|
||||||
|
from django.core.management import execute_from_command_line
|
||||||
|
except ImportError as exc:
|
||||||
|
raise ImportError(
|
||||||
|
"Couldn't import Django. Are you sure it's installed and "
|
||||||
|
"available on your PYTHONPATH environment variable? Did you "
|
||||||
|
"forget to activate a virtual environment?"
|
||||||
|
) from exc
|
||||||
|
execute_from_command_line(sys.argv)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
Binary file not shown.
Loading…
Reference in New Issue