Merge PDF with Python

May 8, 2025 10 min read

The need to merge PDF files arises frequently in both personal and professional contexts. Whether it's combining multiple reports into a single document or assembling different parts of a book, merging PDFs streamlines document management and improves organization. However, using online tools for such tasks can present limitations and privacy concerns, particularly when dealing with sensitive information. These tools often have file size restrictions, require internet connectivity, and may store your documents on their servers.

Effortlessly Merge Your PDFs Online Now!

BreezePDF offers a simple and private way to merge your PDFs directly in your browser.

Merge PDFs for Free →

Python offers a powerful and flexible solution to these challenges. By leveraging Python's robust ecosystem, you can merge PDFs locally, without relying on external services. This approach ensures greater control over your data and eliminates concerns about privacy breaches. Furthermore, Python allows for extensive customization, enabling you to tailor the merging process to your specific needs.

For those who prefer a more user-friendly approach without coding, consider using BreezePDF. This online tool offers a simple drag-and-drop interface to merge PDFs, ensuring your documents are processed privately within your browser. It provides a quick and convenient alternative to coding while maintaining your data's security.

Why Use Python for Merging PDFs?

Using Python for merging PDFs provides several key advantages. First and foremost, it offers offline functionality, allowing you to merge documents without an internet connection. This is particularly beneficial when working with sensitive data or in environments with limited connectivity. Additionally, Python is a free and open-source language, eliminating any licensing costs associated with proprietary PDF editing software.

The customization options available with Python are another significant benefit. You can tailor the merging process to your specific requirements, such as merging specific page ranges or handling encrypted PDFs. Furthermore, using Python ensures your data remains private, as the entire process occurs locally on your machine, mitigating the risks associated with uploading documents to third-party services. Libraries like pypdf and PyMuPDF are available for PDF manipulation and come with extensive documentation to get started.

Python Libraries for PDF Merging

Several powerful Python libraries can be used for PDF manipulation, each with its own strengths and weaknesses. Among the most popular is pypdf (formerly PyPDF2), which is known for its ease of use and comprehensive features. PyMuPDF (Fitz) offers excellent performance and supports a wide range of PDF operations.

Other notable libraries include pdfrw, which is valued for its speed and efficiency, particularly when dealing with large PDF files. Lastly, pikepdf provides advanced capabilities for manipulating PDF files, including encryption and decryption. Choosing the right library depends on the specific requirements of your project, with factors like performance, features, and ease of use playing a significant role.

Getting Started: Installing the Necessary Libraries

To begin merging PDFs with Python, you first need to install the necessary libraries. The easiest way to do this is using pip, the Python package installer. Open your terminal or command prompt and run the following commands to install the libraries mentioned earlier.

  • pip install pypdf
  • pip install pymupdf
  • pip install pdfrw
  • pip install pikepdf

If you are using Conda, you can activate your environment with conda activate py_envs and use conda install followed by the package name.

Merging PDFs with pypdf (PdfWriter)

pypdf offers a straightforward approach to merging PDFs using the PdfWriter class. This method allows you to concatenate entire PDF files or specific page ranges with ease. This approach is particularly useful for creating comprehensive documents from multiple sources.

Basic File Concatenation

To merge entire PDF files, start by importing the PdfWriter class: from pypdf import PdfWriter. Next, create a PdfWriter object: merger = PdfWriter(). Then, append each PDF file to the PdfWriter object: merger.append(pdf_file). Finally, write the merged PDF to a new file: merger.write("merged.pdf"), and close the PdfWriter object: merger.close(). This process efficiently combines the content of multiple PDF files into a single document.

Merging PDFs in a Directory

To merge all PDF files within a directory, import the necessary modules: os and PdfWriter. Get a list of all PDF files in the directory using os.listdir() and filtering with .endswith(".pdf"). Iterate through the list of PDF files, appending each file to the PdfWriter object. Write the merged PDF to a new file: merger.write("merged.pdf") and close the PdfWriter object: merger.close(). This is an efficient way to automate the merging of multiple PDFs stored in a specific folder.

Merging Specific Page Ranges

To merge specific page ranges from PDF files, use the merger.append(pdf, pages=(start, stop)) method. For example, to merge the first three pages of a PDF, use merger.append(pdf, pages=(0, 3)). To merge specific pages, such as pages 1, 3, and 5, use merger.append(pdf, pages=(0, 6, 2)). This allows for precise control over which pages are included in the merged document.

Merging PDFs with PyMuPDF

PyMuPDF, also known as Fitz, offers another robust solution for merging PDFs, providing both command-line and Python code options. This library excels in performance and supports a wide range of PDF manipulation tasks. You can choose the method that best suits your workflow.

Command Line Usage

PyMuPDF provides a convenient command-line interface for merging PDFs. Open your terminal or command prompt and use the following command: python -m fitz join -o result.pdf file1.pdf file2.pdf file3.pdf. This command merges file1.pdf, file2.pdf, and file3.pdf into a new file named result.pdf. The command-line tool offers a quick and efficient way to merge PDF files without writing any code.

Using Python Code

To merge PDFs using Python code with PyMuPDF, start by importing the library: import fitz. Create a new PDF object: result = fitz.open(). Iterate through a list of PDF files: for pdf in ['file1.pdf', 'file2.pdf', 'file3.pdf']:. Open each PDF and insert it into the result: result.insert_pdf(mfile). Finally, save the merged PDF: result.save("result.pdf"). This provides a programmatic approach to merging PDFs, allowing for greater flexibility and control.

Maintaining Table of Contents

When merging PDFs, maintaining the table of contents (TOC) is crucial for navigation. First, extract the TOC from the source document using toc = doc.get_toc(). Adjust the page numbers in the TOC to reflect the new page numbers in the merged document. Finally, append the adjusted TOC to the output document. This ensures that the merged PDF retains its original table of contents, providing a seamless user experience.

Advanced Techniques and Considerations

When working with PDFs, you may encounter advanced scenarios that require specific techniques. These include handling encrypted PDFs, excluding specific pages, and addressing deprecation warnings. Being aware of these considerations ensures a smooth and efficient PDF merging process.

Handling Encrypted PDFs

Encrypted PDFs require special handling to ensure they can be merged correctly. If a PDF is password-protected, you may need to provide the correct password to decrypt it before merging. Libraries like pikepdf offer methods for decrypting PDFs programmatically. Ensure you have the necessary permissions to decrypt and merge encrypted PDFs to avoid legal issues.

Excluding Specific Pages

Sometimes, you may need to exclude specific pages from the merging process. This can be achieved by slicing the pages using Python. For example, to exclude the last page of a PDF, use writer.addpages(PdfReader(inpfn).pages[:-1]). This allows you to selectively merge only the desired pages from a PDF file.

Addressing Deprecation Warnings

Older versions of pypdf used PdfFileMerger, which is now deprecated in favor of PdfWriter. If you encounter deprecation warnings, update your code to use PdfWriter instead. This ensures that your code is compatible with the latest versions of the library and takes advantage of its improved features and performance.

Performance Considerations

When merging large PDF files, performance becomes a critical factor. Different Python libraries offer varying levels of performance. pdfrw is generally considered the fastest for basic merging tasks. PyMuPDF often outperforms pypdf in terms of speed, making it a better choice for performance-sensitive applications. Consider these factors when selecting a library for your PDF merging needs.

BreezePDF: A Simpler Solution

While Python offers powerful capabilities for merging PDFs, it may not be the most accessible solution for everyone. For users seeking a simpler, more intuitive approach, BreezePDF provides an excellent alternative. This online tool allows you to merge PDFs quickly and easily without writing any code, similar to creating fillable PDFs as described on our blog here.

BreezePDF simplifies the PDF merging process with its user-friendly interface and drag-and-drop functionality. Simply upload your PDF files, reorder them as needed, and click the merge button. The tool handles the rest, providing you with a single, merged PDF file in seconds. This eliminates the need for complex coding and ensures a seamless user experience, focusing on privacy by keeping all processing local to your browser.

Moreover, BreezePDF offers additional features such as reordering pages and deleting unwanted pages before merging, giving you even greater control over the final document. This makes it a versatile tool for various PDF merging tasks, whether you're combining reports, assembling documents, or creating a single PDF from multiple sources. Why not try it out?

Conclusion

In conclusion, merging PDFs with Python offers a flexible and powerful solution for those comfortable with coding. Libraries like pypdf and PyMuPDF provide extensive features for concatenating, manipulating, and customizing PDF files. However, for users seeking a simpler, more user-friendly approach, BreezePDF offers a convenient alternative. Why not give creating a fillable PDF for free on Breeze PDF a try?

Whether you choose to merge PDFs with Python or use BreezePDF, the ability to combine multiple documents into a single file streamlines document management and enhances productivity. Evaluate your needs, technical skills, and privacy considerations to determine the best approach for your specific use case. Both methods provide valuable tools for managing and manipulating PDF files effectively.