PacktLib: Python 2.6 Text Processing: Beginners Guide

Python 2.6 Text Processing Beginner's Guide

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Getting Started

Categorizing types of text data

Ensuring you have Python installed

Implementing a simple cipher

Time for action – implementing a ROT13 encoder

Time for action – processing as a filter

Time for action – skipping over markup tags

Supporting third-party modules

Time for action – installing SetupTools

Running a virtual environment

Time for action – configuring a virtual environment

Where to get help?

Summary

Working with the IO System

Parsing web server logs

Time for action – generating transfer statistics

Using objects interchangeably

Time for action – introducing a new log format

Accessing files directly

Time for action – accessing files directly

Time for action – handling compressed files

Accessing multiple files

Time for action – spell-checking HTML content

Accessing remote files

Time for action – spell-checking live HTML pages

Time for action – handling urllib 2 errors

Handling string IO instances

Understanding IO in Python 3

Summary

Python String Services

Understanding the basics of string object

Time for action – employee management

String formatting

Time for action – customizing log processor output

Time for action – adding status code data

Creating templates

Time for action – displaying warnings on malformed lines

Calling string object methods

Time for action – simple manipulation with string methods

Summary

Text Processing Using the Standard Library

Reading CSV data

Time for action – processing Excel formats

Time for action – CSV and formulas

Time for action – processing custom CSV formats

Writing CSV data

Time for action – creating a spreadsheet of UNIX users

Modifying application configuration files

Time for action – adding basic configuration read support

Time for action – relying on configuration value interpolation

Time for action – configuration defaults

Writing configuration data

Time for action – generating a configuration file

Reconfiguring our source

Time for action – creating an egg-based package

Working with JSON

Time for action – writing JSON data

Summary

Regular Expressions

Simple string matching

Time for action – testing an HTTP URL

Advanced pattern matching

Time for action – regular expression grouping

Implementing Python-specific elements

Time for action – reading DNS records

Summary

Structured Markup

XML data

SAX processing

Time for action – event-driven processing

Time for action – driving incremental processing

Time for action – creating a dungeon adventure game

The Document Object Model

Time for action – updating our game to use DOM processing

XPath

Time for action – using XPath in our adventure

Reading HTML

Time for action – displaying links in an HTML page

Summary

Creating Templates

Time for action – installing Mako

Basic Mako usage

Time for action – loading a simple Mako template

Time for action – reformatting the date with Python code

Time for action – defining Mako def tags

Time for action – converting mail message to use namespaces

Inheriting from base templates

Time for action – updating base template

Time for action – adding another inheritance layer

Customizing

Time for action – creating custom Mako tags

Overviewing alternative approaches

Summary

Understanding Encodings and i18n

Understanding basic character encodings

Unicode

Encodings in Python

Time for action – manually decoding

Time for action – copying Unicode data

Time for action – fixing our copy application

The codecs module

Time for action – changing encodings

Adopting good practices

Internationalization and Localization

Time for action – preparing for multiple languages

Time for action – providing translations

Summary

Advanced Output Formats

Dealing with PDF files using PLATYPUS

Time for action – installing ReportLab

Time for action – writing PDF with basic layout and style

Writing native Excel data

Time for action – installing xlwt

Time for action – generating XLS data

Working with OpenDocument files

Time for action – installing ODFPy

Time for action – generating ODT data

Summary

Advanced Parsing and Grammars

Defining a language syntax

PyParsing

Time for action – installing PyParsing

Time for action – implementing a calculator

Time for action – handling type translations

Time for action – suppressing portions of a match

Processing data using the Natural Language Toolkit

Time for action – installing NLTK

Summary

Searching and Indexing

Understanding search complexity

Time for action – implementing a linear search

Text indexing

Time for action – installing Nucular

Time for action – full text indexing

Time for action – measuring index benefit

Time for action – field-qualified indexes

Time for action – performing advanced Nucular queries

Indexing and searching other data

Time for action – indexing Open Office documents

Other index systems

Summary

Looking for Additional Resources

Looking for Additional Resources

Looking for Additional Resources

Looking for Additional Resources

Looking for Additional Resources

Looking for Additional Resources

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Pop Quiz Answers

Index