Commit 42325485 authored by S Anand's avatar S Anand

WIP re-structuring of YAML config

parent dfb8c80e
Pipeline #30767 failed with stage
in 13 seconds
How to use PPTXHandler
=======================
Step 1: `pip install python-pptx`.
Step 2: Clone the pptgen repository- `https://code.gramener.com/sanjay.yadav/pptgen`.
Step 3: Navigate to the cloned repository and run `pip install -e .`
Step 4: Install gramex's latest version from dev branch `pip install https://code.gramener.com/s.anand/gramex/repository/archive.tar.bz2?ref=dev`
PPTXHandler(PPTGEN as Handler):
====================================================
url:
download/ppt:
pattern: /$YAMLURL/pptdownload
handler: PPTXHandler
kwargs:
source: input.pptx # Input presentation.
data:
chart_data: {format: csv, path: charts-data.csv} # You can pass a function as well
change-title: # This section defines the first change.
Title 1: # Take the shape named "Title 1"
text: New Title # Replace its text with "New Title"
bar_chart: # Bar chart config
chart:
data: data['chart_data']
x: Cities # X-Axis data `column` or function
size: Size # Size of bar `column` or function
color: Colors # Bar Color `column` or function
PPTGEN as API
==================
import pptgen
prs = pptgen.Presentation('input.pptx') # Loading input presentation
slides = prs.slides() # Returns the list of slides
for slide in slides:
shapes = slide.shapes() # Access for shapes in a slide
.......continue doing as per your need on shapes
Pass configuration to PPTGEN API
================================
import pptgen
config = AttrDict({
'source': 'input.pptx',
'target': 'output.pptx',
'data':{
'chart_data': {'format': 'csv', 'path': 'charts-data.csv'}
},
'change-title':{
'Title 1':{
'text': 'New Title'
}
}
})
pptgen.pptgen(**config) # Will write the output at target location
PPTGen
======
PPTGen is a library and tool to make PowerPoint presentations from data.
Installation
------------
pip install pptgen
Usage
-----
# PPTGen
PPTGen lets you modify the contents of a PowerPoint presentation based on data.
For example, you can:
......@@ -73,83 +6,184 @@ For example, you can:
- Update charts, images and text from data
- Create a series of slides using a template from spreadsheets or database
**Examples are in `tests/`.** You can run the commands below in the `tests/` directory.
It is a command line utility and a Python function packaged with Gramex.
It forms the basis of PPTXHandler.
PPTGen takes a YAML configuration file as input.
## Command line usage
pptgen config.yaml
PPTGen uses a configuration to modify files. The configuration defines the
`source`, `target`, `data` and any number of rules.
The YAML configuration defines the `source`, `target`, `data` and any number of
rules. Here is a simple configuration that just copies a source PPTX to a
target PPTX without any changes.
On the command line, it accepts a YAML file as input. For example, this
`text.yaml` copies input.pptx to output.pptx and changes the title to "New Title":
```yaml
source: input.pptx # Default input
target: out-unchanged.pptx # Default output
source: input.pptx # optional path to source. Default to blank PPT with 1 slide
target: output.pptx # required path to save output as
change:
Title 1: # Take the shape named "Title 1"
text: New Title # Replace its text with "New Title"
```
You can override these parameters using `--source <path>` and `--target <path>`.
This can be run as:
pptgen config.yaml --source input.pptx --target out-cmdline.pptx
pptgen text.yaml
### Shapes
You can override parameters from the command line like this:
In PowerPoint, all shapes have names. To see shape names, select Home tab >
Drawing group > Arrange drop-down > Selection pane. Or press ALT + F10.
pptgen config.yaml --target new.pptx "--change.Title 1.text" "Updated title"
![Selection pane](help/selection-pane.png)
## API usage
To change the shape names, double-click on the name in the selection pane.
As an API, it accepts keyword arguments as the configuration. For example:
### Text
```python
from pptgen import pptgen
pptgen(
source='input.pptx', # optional path to source. Defaults to blank PPT with 1 slide
target='output.pptx', # optional target. Otherwise, returns the pptx.Presentation()
change={ # Configurations are same as when loading from the YAML file
'Title 1': { # Take the shape named "Title 1"
'text' 'New Title' # Replace its text with "New Title"
}
}
)
```
# Configuration
The configuration accepts the following top-level keys:
- `source`: optional. Path to input Presentation to be used as the source.
Defaults to a blank presentation with 1 slide.
- `target`: required for the command line, and is where the output PPTX is saved.
It is optional for the API. If None, `pptgen` returns the Presentation object.
- `data`: optional dataset or a dictionary. This is described below.
- All other keys are treated as rules that are described below.
To change the title on the input slide to "New title", use this configuration `config-text.yaml`:
## Data
PPTGen can change presentations with data from various sources. It uses
`gramex.data.filter`. It supports these keys:
- `url:` Pandas DataFrame, sqlalchemy URL or file name
- `ext:` file extension (if url is a file). Defaults to url extension
- `args`: optional filters to apply to dataset. Passed as a dict of lists
- `table:`: table name (if url is an SQLAlchemy URL)
- `query:` optional SQL query to execute (if url is a database)
- `transform:`: optional in-memory transform. Takes a DataFrame and returns a DataFrame
- Any additional keys are passed to `gramex.cache.open` or `sqlalchemy.create_engine`
```yaml
source: input.pptx
target: out-text.pptx
change-title: # This section defines the first change.
# (You can replace "change-title" with anything.)
data:
cities: {url: cities.csv} # Load cities.csv into "cities" key
sales: {url: sales.xlsx, sheet: Sheet1} # Load Sheet1 from sales.xslx into "sales" key
tweets: {url: tweets.json} # Load JSON data into "tweets" key
sample: {url: mysql://server/db, table: sample} # Load sample data from MySQL
filter:
url: cities.csv # Load cities.csv
args: # Filter results
Country: [Egypt, Sudan]} # WHERE column Country is Egypt or Sudan
Population>: 100000 # AND column Population is 100,000+
```
## Rules
The configuration can define any number of rules. Rules have:
- one or more [shape names](#shapes), and the list of [commands](#commands) to
apply to those shapes.
- an optional [slide selector](#slides) that restricts which slide they apply to.
By default, rules apply to all slides.
In the example below, there is 1 rule called `change`. It does no slide selector,
so it applies to all slides. It has 1 shape: `Title 1` with 1 command: `text`:
```yaml
source: input.pptx # optional path to source. Default to blank PPT with 1 slide
target: output.pptx # required path to save output as
change:
Title 1: # Take the shape named "Title 1"
text: New Title # Replace its text with "New Title"
```
To *substitute* text instead of replacing the full title -- for example, to just
replace "title" with "heading", and "Old" with "New" use `config-replace.yaml`:
### Slides
By default, changes are applied to all slides. To restrict changes to a specific
slide, use:
1. `slide-number` slide number (with the first slide as slide 1).
1. `slide-title` is a regular expression that matches the slide title.
1. `slide-range`: slide number range as a list of 2 slides
```yaml
source: input.pptx
target: out-replace.pptx
replace-title: # The change section can be called anything
Title 1: # Take the shape named "Title 1"
replace: # Replace these keywords
"Old": "New" # Old -> New
"Title": "Heading" # Title -> Heading
target: output.pptx
rule-1: # rule-1 applies to all slides
...
rule-2:
slide-number: 1 # rule-2 applies only to the first slide of the source
...
rule-3:
slide-title: Hello # rule-3 applies to slides with the title "Hello" (regex)
...
rule-4:
slide-range: [3, 6] # rule-4 applies to slides 3, 4, 5, 6 (inclusive)
...
```
Replacement only works for words that have the same formatting. For example, in
some_where_, "where" is underlined. You cannot replace "somewhere". But you can
replace "some" and "where" independently.
To create multiple slides from data, add `data:` to the change. For example:
### Images
```yaml
source: input.pptx
target: out-slides.pptx
data:
sales: {xlsx: sales.xlsx}
change-title:
data: sales # For each row in the sales dataset (defined above)...
slide-number: 1 # ... copy slide 1 and apply this change
Title 1:
text: "Region {{ region }} has sales of ${{ sales }}"
```
To change the picture on an image, use `config-image.yaml`:
The `data:` here is the name of the dataset defined in the root `data:` section.
It can be used with
- `slide-number` to repeat individual slides
- `slide-title` to repeat individual slides or multiple single slides
- `slide-range` to repeat groups of slides. For example `slide-range: [1,2]` will
copy slides 1 & 2 as many times as there are rows of data
### Shapes
In PowerPoint, all shapes have names. To see shape names, select Home tab >
Drawing group > Arrange drop-down > Selection pane. Or press ALT + F10.
![Selection pane](help/selection-pane.png)
To change the shape names, double-click on the name in the selection pane.
You can specify changes to one or more shapes in a [rule](#rules). For example:
```yaml
source: input.pptx
target: out-image.pptx
change-image:
Picture 1: # Take the shape named "Picture 1"
image: sample.png # Replace the image with sample.png
rule-1:
Title 1:
text: New title
background-color: red
Text 1:
text: New text
color: green
```
The image can be a URL or a file path.
... changes 2 shapes named `Title 1` and `Text 1`.
Shape names may refer to native elements or [groups](#groups).
### Groups
To change groups' contents, use a nested configuration. For example, if the group
named "Group 1" has text named "Caption" and an image named "Picture", this
`config-group.yaml` replaces those:
Shape names may refer to groups. To change groups' contents, use a nested
configuration. For example, if "Group 1" has "Caption" and "Picture" inside it,
this `config-group.yaml` replaces those:
```yaml
source: input.pptx
......@@ -162,133 +196,124 @@ change-image:
image: sample.png # Replace the image with sample.png
```
### Data
## Commands
Shapes can be changed using 1 or more commands. These commands can change the
shape's style and content, or add new content (like charts).
### CSS
The following CSS-like commands change the shape's display attributes:
- `font-size`: sets the font size
- `font-family`: sets the font family
- `color`: sets the text / foreground color
- `fill`: sets the shape's background color
- `opacity`: sets the shape's opacity level
- `stroke`: sets the shape outline color
- `stroke-width`: sets the shape outline width
- `width`: sets the shape width
- `height`: sets the shape height
- `left`: sets the shape X position
- `top`: sets the shape Y position
PPTGen can change presentations with data from various sources. This example
shows all ways of loading data:
Values support [templates](#templates).
### Text
To change the title on the input slide to "New title", use this configuration:
```yaml
source: input.pptx
target: out-unchanged.pptx
data: # The data section
cities: {csv: cities.csv} # Load CSV data into "cities" key
sales: {xlsx: sales.xlsx, sheet: Sheet1} # Load Excel sheet into "sales" key
tweets: {json: tweets.json} # Load JSON data into "tweets" key
sample: {yaml: sample.yaml} # Load YAML data into "config"
direct: {values: {x: 1, y: 2}} # The "direct" key takes values directly
Title 1: # Take the shape named "Title 1"
text: New Title # Replace its text with "New Title"
```
### Templates
`text:` values support [templates](#templates).
You can use values from the data anywhere, as a template. See
`config-template.yaml`:
### Replace
To *substitute* text instead of [replacing the full text](#text), use:
```yaml
source: input.pptx
target: out-template.pptx
data:
tweets: {json: tweets.json}
text-template:
Title 1:
text: "Tweet from @{{ tweets[0]['user']['screen_name'] }}"
Picture 1:
image: "{{ tweets[0]['user']['profile_image_url'] }}"
Title 1: # Take the shape named "Title 1"
replace: # Replace these keywords
"Old": "New" # Old -> New
"Title": "Heading" # Title -> Heading
```
The values inside `{{ ... }}` are evaluated as Python expressions in the context
of `data`.
Replacement only works for words that have the same formatting. For example, in
some_where_, "where" is underlined. You cannot replace "somewhere". But you can
replace "some" and "where" independently.
### Charts
`replace:` values support [templates](#templates).
TBD
### Image
To change the picture on an image, use:
```yaml
Picture 1: # Take the shape named "Picture 1"
image: sample.png # Replace the image with sample.png
```
### Tables
`image:` values support [template](#templates), and can be a URL or file path.
TBD
### Deprecated commands
### Slides
- `rectangle`: use [CSS](#css) commands instead
- `oval`: use [CSS](#css) commands instead
By default, changes are applied to all slides. To restrict changes to a specific
slide, use one of these:
### To be documented
1. `slide-number` indicates the slide number starting with slide 1.
2. `slide-title` is a regular expression that matches the slide title.
- `chart`
- `table`
- `sankey`
- `bullet`
- `treemap`
- `heatgrid`
- `calendarmap`
- `custom_table`
For example, this applies the `change-title` rule only on slide 1.
### Templates
```yaml
source: input.pptx
target: out-text.pptx
change-title:
slide-number: 1 # Apply this change only on slide 1
Title 1:
text: New title
```
For commands that support templates, values inside `{{ ... }}` are evaluated as
Python expressions in the context of `data`.
To create multiple slides from data, add `data:` to the change. For example:
For example:
```yaml
source: input.pptx
target: out-slides.pptx
data:
sales: {xlsx: sales.xlsx}
change-title:
slide-number: 1 # Apply this change only on slide 1
data: sales # For each row in sales.xlsx, create a new slide
tweets: tweets.json
change:
Title 1:
text: "Region {{ region }} has sales of ${{ sales }}"
text: `Tweet from @{{ tweets[0]['user']['screen_name'] }}`
```
### Layouts
... will replace the contents inside `{{ ... }}` with the value of
`tweets[0]['user']['screen_name']` evaluated in Python. The variable `tweets` is
the result of loading `tweets.json`.
### Layout
To create multiple shapes using data, use `layout:` and `data:`. For example:
```yaml
source: input.pptx
target: out-layout.pptx
data:
sales: {xlsx: sales.xlsx}
multiple-objects:
Picture 1: # Take the Picture 1 shape
data: sales # Duplicate it for each row in sales
layout: horizontal # Lay the images out horizontally
image: "{{ region }}.png" # Change the picture using this template
Picture 1: # Take the Picture 1 shape
data: sales # Duplicate it for each row in sales
layout: hotizontal # Lay the images out horizontally to the right
padding-right: 10 # With a padding of 10 units
image: "{{ region }}.png" # Change the picture using this template
```
Currently, `layout:` supports `horizontal` and `vertical`. We may extend this to
`grid: [<columns>, <rows>]` and `wrap: <items>`.
Development
-----------
To set up the development environment, clone this repo. Then run:
pip uninstall pptgen
pip install -e .
Create a branch for local development using `git checkout -b <branch>`.
Test your changes by running `nosetests`.
Commit your branch and send a merge request.
Release
-------
When releasing a new version of pptgen:
1. Check [build errors](http://code.gramener.com/sanjay.yadav/pptgen/pipelines).
2. Run `nosetests` on Python 2.7 and on 3.x
3. Update version number in `pptgen/release.json`
4. Push `dev` branch to the server. Ensure that there are no build errors.
5. Merge with master, create an annotated tag and push the code:
git checkout master
git merge dev
git tag -a v1.x.x # Annotate with a one-line summary of features
git push --follow-tags
git checkout dev # Switch back to dev
The `data:` here is the name of the dataset defined in the root `data:` section.
For each row in `data`, the shape is duplicated and laid out based on `layout:`.
6. Release to PyPi
`layout:` supports:
python setup.py sdist bdist_wheel --universal
twine upload dist/*
- `horizontal` copies the element right with an optional `padding-right` (default: 0)
- `vertical` copies the element below with an optional `padding-bottom` (default: 0)
- `wrap` copies the element to the right `cols` times, and then moves 1 row
below. It supports `padding-right` and `padding-bottom`
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment