How to quickly convert file formats
published 2023-08-08 (last changed on 2024-03-14) by
This is a collection of a few useful commands for converting files between formats that I use all the time.
# Audio
# Audio → FLAC
lossless if converting from e.g. WAV
$ flac --best input.wav
# Audio → Opus
Check Opus Recommended Settings for the ideal bitrate for your use case.
24 Kb/s is a good basis for very small files that are still high quality for voice recordings. Only use --downmix-mono
if you don’t loose information by merging stereo audio to mono.
$ opusenc --bitrate 24 --downmix-mono input.wav output.opus
# PDFs
# PDF → Extracted Images
$ mkdir tmp
$ pdfimages input.pdf -all tmp/name
This will extract all images that are contained in the input pdf to tmp/name-000.png
, tmp/name-001.jpg
, etc.
# PDF → PNG
$ pdftoppm input.pdf slides -png -scale-to 1080 -progress
This will generate images like slides-1.png
for every page of the PDF with the specified width. With e.g. -scale-to 3840
one can quickly convert a PDF of presentation slides to images for a high quality 4K video.
# PDF → compressed PDF
Sometimes I have a PDF that is far too large (hundreds of MB for a simple document) because it was generated in an ineffcient way. Using ghostscript can in many cases reduce the file size dramatically while decreasing the quality only a bit.
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dPrinted=false -dNOPAUSE -dBATCH -sOutputFile=small.pdf input.pdf
Depending on the required quality, /ebook
can be replaced with one of /printer
, /prepress
and /default
(or /screen
for a very bad quality). See the documentation for more information
# PDF → PDF with OCR
$ ocrmypdf -cdr --force-ocr input.pdf ocr.pdf -l deu
Check the documentation for more information.
# PDF → Plaintext → Spellcheck
$ pdftotext main.pdf - | pylanguagetool
Using my own commandline interface to LanguageTool.