'Django Upload pdf then run script to scrape pdf and output results

I am attempting to create a Django web app that allows the user to upload a pdf then have a script scrape it and output and save certain text that the script scraped.

I was able to find some code to do the file upload part. I have the script to scrape the pdf. Not sure how to tie them togther to accomplish this task.

views.py

from django.shortcuts import redirect, render
from .models import Document
from .forms import DocumentForm

def my_view(request):
    print(f"Great! You're using Python 3.6+. If you fail here, use the right version.")
    message = 'Upload PDF'
    # Handle file upload
    if request.method == 'POST':
        form = DocumentForm(request.POST, request.FILES)
        if form.is_valid():
            newdoc = Document(docfile=request.FILES['docfile'])
            newdoc.save()

        # Redirect to the document list after POST
        return redirect('my-view')
    else:
        message = 'The form is not valid. Fix the following error:'
else:
    form = DocumentForm()  # An empty, unbound form

# Load documents for the list page
documents = Document.objects.all()

# Render list page with the documents and the form
context = {'documents': documents, 'form': form, 'message': message}
return render(request, 'list.html', context)

forms.py

from django import forms

class DocumentForm(forms.Form):
    docfile = forms.FileField(label='Select a file')

models.py

from django.db import models

class Document(models.Model):
    docfile = models.FileField(upload_to='documents/%Y/%m/%d')

list.html

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
<title>webpage</title>
    </head>
<body>
    <!-- Upload form. Note enctype attribute! -->
    <form action="{% url "my-view" %}" method="post" enctype="multipart/form-data">
        {% csrf_token %}
        {{ message }}
        <p>{{ form.non_field_errors }}</p>

        <!-- Select a file: text -->
        <p>{{ form.docfile.label_tag }} {{ form.docfile.help_text }}</p>

        <!-- choose file button -->
        <p>
            {{ form.docfile.errors }}
            {{ form.docfile }}
        </p>

        <!-- Upload button -->

        <p><input type="submit" value="Upload"/></p>
    </form>
</body>

edit added urls.py
urls.py

from django.urls import path
from .views import my_view

urlpatterns = [
    path('', my_view, name='my-view')
]

Scrape.py
Want to output and save Plan_Name.

import os
import pdfplumber
import re
directory = r'C:User/Ant_Esc/Desktop'

for filename in os.listdir(directory):
    if filename.endswith('.pdf'):
        fullpath = os.path.join(directory, filename)
        #print(fullpath)
        all_text = ""
        with pdfplumber.open(fullpath) as pdf:
            for page in pdf.pages:
                text = page.extract_text()
                #print(text)
                all_text += ' ' + text
                all_text = all_text.replace('\n','')
            pattern ='Plan Title/Name  .*? Program/Discipline'
            Plan_Name = re.findall(pattern, all_text,re.DOTALL)
            for i in Plan_Name:
                Plan_Name = i.removesuffix('Program/Discipline')
                Plan_Name = Plan_Name.removeprefix('Plan Title/Name  ')


Solution 1:[1]

I've gone through your code , can you confirm on below two queries? i feel these are missing.

  1. Are you getting any error with above code?
  2. URL entry is added in URL.py
  3. from where you are calling scrap.py?

My suggestion is you can call srap.py after successfull file save in view.py newdoc.save() or you can call scrap.py from model using super method.

Let me know if you need more help on this.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vishal Bulbule