How to create PDF files with Python and Weasyprint

In this tutorial you’ll learn how to create beautiful PDF files with Python and Weasyprint, a Python library for converting HTML to PDF.

How to create PDF files with Python and Weasyprint

Requirements

To follow along with the tutorial you’ll need:

  • Python 3.x installed on your computer
  • A code editor (VsCode or Pycharm CE)
  • a basic understanding of HTML, CSS, and Python

What we will build?

In this guide we’ll create a multi-page PDF file with Python. The PDF will be an exam with questions and answers.

I actually use this Python script in the real world. Every time I run a workshop I give quizzes to the attendees. But every person gets a different version of the quiz, with randomized questions and answers!

And now hands on!

Setting up the project

To start off create a new folder and move into it:

mkdir create_pdf_python && cd $_

Once inside create a new Python virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate

Finally create a new folder called src inside the project folder:

mkdir src

Still in the project root install Weasyprint:

pip install weasyprint

Before using the library you’ll need cairo, pango, and a couple of other packages. Check out the installation instructions here.

And your good to go!

How to create PDF files with Python and Weasyprint: what is Weasyprint?

Weasyprint is a Python library for creating PDF files starting from HTML and CSS.

There are many alternative to Weasyprint: xhtml2pdf, reportlab, and pydf, but I found Weasyprint the most easy to use, cleanest PDF library for Python.

Hold tight, in the next section we’ll lay down our project.

How to create PDF files with Python and Weasyprint: questions and answers

With the project opened in your code editor create a new file called questions.py in src. This file will hold a bunch of questions and answers.

Every question is contained in a Python dictionary, with every dictionary part of a list called questions. Answers are saved in a list inside every dictionary:

# src/questions.py

questions = [
    {
        "question": "What is Python?",
        "answers": [
            "it is just an animal",
            "it is a beautiful programming language",
            "it is a comedy",
        ],
    },
    {
        "question": "Python is:",
        "answers": [
            "static, compiled, not portable",
            "dynamic, object oriented, interpreted",
        ],
    },
    {
        "question": "What immutable data types are there in Python?",
        "answers": [
            "string, float, integer, and dictionary",
            "string, float, integer, boolean, None, and tuple",
            "list, dictionary, set, and tuple",
        ],
    },
]

Now our goal is to create a PDF file with Python were every page of the file has a random version of both questions and answers. It’s a bit evil isn’t?

How to create PDF files with Python and Weasyprint: importing stuff and a bit of CSS

With the project opened in your code editor create a new file called generate_pdf.py in src. This file will contain all the logic for shuffling questions, answers and for creating the multi-page PDF.

What should go in the file? First thing first, let’s import the HTML class from Weasyprint, the shuffle function from the random module, and our questions list:

# src/generate_pdf.py

from weasyprint import HTML
from random import shuffle
from questions import questions

While there let’s also define a variable named style with a bunch of CSS rules for styling our PDF:

# src/generate_pdf.py

from weasyprint import HTML
from random import shuffle
from questions import questions

style = """<style>
h1, h2 { margin-bottom: 30px; }
.question { display: block; margin-bottom: 20px; }
table { display: block; }
.question--multiple { display: block; margin-bottom: 50px; }
.question__id { display: block; margin-right: 50px; }
.answer__letter { display: block; 
                  margin-left: 35px; 
                  margin-right: 20px;
                  border: 1px solid;
                  padding: 3px;
                 }
</style>
"""

If you’re wondering why there are CSS rules for stiling a table, as a “design” decision I want to generate an HTML table for every question where:

  • the thead will contain a question id and the actual question body
  • the tbody will hold a table row with table cells for every answer

Good, now let’s talk about shuffling!

How to create PDF files with Python and Weasyprint: random questions and answers

Let’s mix things up a little bit by shuffling the order for questions and answers. This way lazy students would not be able (in theory) to copy from each other!

Still in generate_pdf.py create a new function named shuffle_question_and_answers. It takes our question list and rearranges the order for questions and their related answers:

from weasyprint import HTML
from random import shuffle
from questions import questions

style = """<style>
h1, h2 { margin-bottom: 30px; }
.question { display: block; margin-bottom: 20px; }
table { display: block; }
.question--multiple { display: block; margin-bottom: 50px; }
.question__id { display: block; margin-right: 50px; }
.answer__letter { display: block; 
                  margin-left: 35px; 
                  margin-right: 20px;
                  border: 1px solid;
                  padding: 3px;
                 }
</style>
"""


def shuffle_question_and_answers(src):
    shuffle(src)
    for question in src:
        answers = question.get("answers")
        if answers:
            shuffle(answers)

Note that shuffle works in place on the list and returns None. With this logic in place we can now start to assemble the HTML in the next section.

How to create PDF files with Python and Weasyprint: assembling the HTML

Before creating the actual PDF file we need to build the HTML. Weasyprint takes HTML and CSS and produces PDF.

In generate_pdf.py create a new function named generate_html. It takes our question list (already rearranged) and an optional title for the page.

Inside this function we can start building our HTML, as a string:

# src/generate_pdf.py

# rest omitted for brevity

def generate_html(src, title):
    html = f"""
    <html>
    <head>
        <meta charset="UTF-8">
        {style}
    </head>
    <h1>{title}</h1>
    <h2>Class:      Date: <br>Full name:</h2>
    <body>
    """

As you can see we use the global module variable style for applying CSS to our page.

Now, keep in mind that this is just a quick start tutorial. Feel free to use a different approach if the idea of mixing up Python and HTML like that gives you shudder.

With this basic HTML structure in place it’s time to add questions to the page. In generate_html we can iterate over every question and while doing that we can:

  • build a new table and a thead
  • insert every question in thead

After the for loop we’ll also need to close the HTML document. Here’s the idea:

# src/generate_pdf.py

# rest omitted for brevity

def generate_html(src, title):
    html = f"""
    <html>
    <head>
        <meta charset="UTF-8">
        {style}
    </head>
    <h1>{title}</h1>
    <h2>Class:      Date: <br>Full name:</h2>
    <body>
    """

    for question_dict in src:
        question = question_dict.get("question")
        question_type = "question--multiple"
        html = (
            html
            + f"""
            <table class={question_type}>
                <thead class="question">
                <tr>
                    <td class="question__id"><strong>question_id</strong></td>
                    <td><strong>{question}</strong></td>
                </tr>
                </thead>
                <tbody>
            """
        )

    html = html + "</html></body>"
    return html

Now, this function will give you a broken HTML page because tables are not closed. But browsers will automagically add a closing tag after each table. We’re halfway through the task!

Let’s add answers for every question in the next section.

How to create PDF files with Python and Weasyprint: adding answers

Let’s expand generate_html with a nested for loop that creates a tbody inside the question table, and for each answer adds a new row to the body.

As a bonus we’ll also add letters to every answer. Here’s the new version of the function:

# src/generate_pdf.py

# rest omitted for brevity

def generate_html(src, title):
    html = f"""
    <html>
    <head>
        <meta charset="UTF-8">
        {style}
    </head>
    <h1>{title}</h1>
    <h2>Class:      Date: <br>Full name:</h2>
    <body>
    """

    for question_dict in src:
        question = question_dict.get("question")
        question_type = "question--multiple"
        html = (
            html
            + f"""
            <table class={question_type}>
                <thead class="question">
                <tr>
                    <td class="question__id"><strong>question_id</strong></td>
                    <td><strong>{question}</strong></td>
                </tr>
                </thead>
                <tbody>
            """
        )

        answers = question_dict.get("answers")
        letters = ["A", "B", "C", "D"]
        for letter, answer in zip(letters, answers):
            html = (
                html
                + f"""
                    <tr>
                        <td class="answer__letter">{letter}</td>
                        <td>{answer}</td>
                    </tr>
                """
            )
        html = html + f"</tbody></table>"

    html = html + "</html></body>"
    return html

If you want to test your code add a new function to generate_pdf.py and write the HTML to a file:

# src/generate_pdf.py

# rest omitted for brevity

def main():
    f = open("test.html", "w")
    shuffle_question_and_answers(questions)
    f.write(generate_html(src=questions, title="testme"))
    f.close()


main()

Run the script (make sure to have virtual env active) with:

(venv) your_prompt$ python src/generate_pdf.py

And you should see a new HTML file in your project folder. It should look like mine:

How to create PDF files with Python and Weasyprint: adding answers

Great job! And now let’s get to the original task: creating the PDF with Python.

How to create PDF files with Python and Weasyprint: the PDF generator

It’s been a lot of work so far, but you can be proud of yourself. Just another couple lines of code and you’ll have your PDF!

Wipe the code for the function main() in generate_pdf.py and let’s begin by asking how many random copies of the test we want to generate, and a title for the quiz too:

# src/generate_pdf.py

# rest omitted for brevity

def main():
    times = int(
        input("Hello teacher! How many random variations do you want to generate?\n")
    )
    title = (
        input('Please provide a title too. Defaults to: "Python quiz"\n')
        or "Python quiz"
    )
    full_html = ""

Now we’ll use the variables times to create a new HTML page each time, so our PDF will be a multi-page version with N random permutations of the quizzes.

To do so import repeat from itertools:

# src/generate_pdf.py

from itertools import repeat

# rest omitted for brevity

and use it for shuffling and generating a new HTML page on every iteration:

# src/generate_pdf.py

from itertools import repeat

# rest omitted for brevity

def main():
    times = int(
        input("Hello teacher! How many random variations do you want to generate?\n")
    )
    title = (
        input('Please provide a title too. Defaults to: "Python quiz"\n')
        or "Python quiz"
    )
    full_html = ""

    for _ in repeat(None, times):
        shuffle_question_and_answers(questions)
        h = generate_html(src=questions, title=title)
        full_html = full_html + h

Finally we can use the HTML class from Weasyprint to create the PDF:

# src/generate_pdf.py

from itertools import repeat

# rest omitted for brevity

def main():
    times = int(
        input("Hello teacher! How many random variations do you want to generate?\n")
    )
    title = (
        input('Please provide a title too. Defaults to: "Python quiz"\n')
        or "Python quiz"
    )
    full_html = ""

    for _ in repeat(None, times):
        shuffle_question_and_answers(questions)
        h = generate_html(src=questions, title=title)
        full_html = full_html + h

    filename = title.lower().replace(" ", "_") + ".pdf"
    HTML(string=full_html).write_pdf(target=filename)
    print(f"{filename} created!")


main()

Take your time to go through the code. In the next section we’ll recap the entire script.

How to create PDF files with Python and Weasyprint: using the script

If you followed everything you should have the following code:

from weasyprint import HTML
from random import shuffle
from itertools import repeat
from questions import questions

style = """<style>
h1, h2 { margin-bottom: 30px; }
.question { display: block; margin-bottom: 20px; }
table { display: block; }
.question--multiple { display: block; margin-bottom: 50px; }
.question__id { display: block; margin-right: 50px; }
.answer__letter { display: block; 
                  margin-left: 35px; 
                  margin-right: 20px;
                  border: 1px solid;
                  padding: 3px;
                 }
</style>
"""


def shuffle_question_and_answers(src):
    shuffle(src)
    for question in src:
        answers = question.get("answers")
        if answers:
            shuffle(answers)


def generate_html(src, title):
    html = f"""
    <html>
    <head>
        <meta charset="UTF-8">
        {style}
    </head>
    <h1>{title}</h1>
    <h2>Class:      Date: <br>Full name:</h2>
    <body>
    """

    question_counter = 0

    for question_dict in src:
        question_counter = question_counter + 1
        question = question_dict.get("question")
        question_type = "question--multiple"
        html = (
            html
            + f"""
            <table class={question_type}>
                <thead class="question">
                <tr>
                    <td class="question__id"><strong>{question_counter}</strong></td>
                    <td><strong>{question}</strong></td>
                </tr>
                </thead>
                <tbody>
            """
        )

        answers = question_dict.get("answers")
        letters = ["A", "B", "C", "D"]
        for letter, answer in zip(letters, answers):
            html = (
                html
                + f"""
                    <tr>
                        <td class="answer__letter">{letter}</td>
                        <td>{answer}</td>
                    </tr>
                """
            )
        html = html + f"</tbody></table>"

    html = html + '<p style="break-before: always;"></p></html></body>'
    return html


def main():
    times = int(
        input("Hello teacher! How many random variations do you want to generate?\n")
    )
    title = (
        input('Please provide a title too. Defaults to: "Python quiz"\n')
        or "Python quiz"
    )
    full_html = ""

    for _ in repeat(None, times):
        shuffle_question_and_answers(questions)
        h = generate_html(src=questions, title=title)
        full_html = full_html + h

    filename = title.lower().replace(" ", "_") + ".pdf"
    HTML(string=full_html).write_pdf(target=filename)
    print(f"{filename} created!")


main()

I’ve added two nice features: the question id is created on the fly through the question_counter variable and incremented at each step.

I’ve also added a page break after every HTML page:

html = html + '<p style="break-before: always;"></p></html></body>'

Now, run the script (make sure to have virtual env active) with:

(venv) your_prompt$ python src/generate_pdf.py

and you’ll be asked:

  • “How many random variations do you want to generate?”
  • a title for the quiz

Respond to both and hit enter. You’ll see your PDF file appear in the project folder.

Well done!

Wrapping up

In this tutorial you learned how to create PDF files with Python and Weasyprint: you built a multi-page PDF file starting from HTML pages.

We created the HTML from a list of question and answers, randomized by shuffle.

Now you can apply these concept to create any PDF file you want!

Weasyprint is not the only library for creating PDF files with Python. There are also xhtml2pdf, reportlab, and pydf. But Weasyprint is a pleasure to work with.

Check out the docs for learning more.

Thanks for reading and stay tuned!

Resources

The source code for this project.

The Weasyprint docs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.