Unless otherwise stated by the user or existing template
A user may ask you to create, edit, or analyze the contents of an .xlsx file. You have different tools and workflows available for different tasks.
LibreOffice Required for Formula Recalculation: You can assume LibreOffice is installed for recalculating formula values using the scripts/recalc.py script. The script automatically configures LibreOffice on first run, including in sandboxed environments where Unix sockets are restricted (handled by scripts/office/soffice.py)
CRITICAL: Always use utf-8-sig (UTF-8 with BOM) when writing CSV files that may be opened in Excel (any platform).
Excel identifies a CSV file's encoding by checking for a BOM at the start of the file. Without a BOM, Excel falls back to the system locale encoding (GBK on Chinese Windows), which causes Chinese characters to display as garbled text.
df.to_csv('output.csv') # Uses system locale — unreliable cross-platform
df.to_csv('output.csv', encoding='utf-8') # No BOM — Excel opens as GBK, Chinese becomes garbage
df.to_csv('output.csv', encoding='utf-8-sig') # BOM included — Excel recognises UTF-8 correctly
utf-8-sig works correctly with Excel for Mac and LibreOffice Calc on macOS, but has known issues with other macOS tools:
 prefix in the first cell of the first row. If the output CSV is intended for Numbers, use plain utf-8 instead and instruct the user to open it via File > Open to select encoding manually.
cat, awk, grep, sort, etc.): The 3-byte BOM (\xef\xbb\xbf) will appear at the start of the first line, breaking pattern matches and column splits that target column 1.
read_csv with encoding='utf-8': The BOM appears as \ufeff prepended to the first column name, causing column-name mismatches. Always read back with encoding='utf-8-sig' to strip the BOM automatically.
Decision guide — which encoding to use when writing CSV:
| Target consumer | Encoding |
|----------------|---------|
| Excel (Windows or Mac) | utf-8-sig |
| Excel for Mac + LibreOffice Calc | utf-8-sig |
| macOS Numbers.app | utf-8 (no BOM) |
| macOS / Linux command-line tools or scripts | utf-8 (no BOM) |
| Unknown / general purpose | utf-8-sig (safest for human users) |
If the target is unknown, prefer utf-8-sig — it is transparent to most modern applications and essential for Windows Excel.
When reading a CSV that may have been created on a Chinese Windows system, detect the encoding first:
import chardet
with open('input.csv', 'rb') as f:
raw = f.read()
encoding = chardet.detect(raw)['encoding'] or 'utf-8'
df = pd.read_csv('input.csv', encoding=encoding)
When reading a CSV that was written with utf-8-sig, use the matching encoding to strip the BOM:
df = pd.read_csv('input.csv', encoding='utf-8-sig') # BOM stripped automatically
| Scenario | Required encoding |
|----------|------------------|
| Writing CSV for Excel (any platform) | utf-8-sig |
| Writing CSV for Numbers.app (macOS) | utf-8 (no BOM) |
| Writing CSV for command-line / scripts | utf-8 (no BOM) |
| Writing CSV for unknown / general use | utf-8-sig |
| Reading CSV from unknown source | detect with chardet, fallback to utf-8 |
| Reading CSV known to be UTF-8 with BOM | utf-8-sig |
| Reading CSV from Chinese Windows | gbk or gb18030 |
For data analysis, visualization, and basic operations, use pandas which provides powerful data manipulation capabilities:
import pandas as pd
# Read Excel
df = pd.read_excel('file.xlsx') # Default: first sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # All sheets as dict
# Analyze
df.head() # Preview data
df.info() # Column info
df.describe() # Statistics
# Write Excel
df.to_excel('output.xlsx', index=False)
Always use Excel formulas instead of calculating values in Python and hardcoding them. This ensures the spreadsheet remains dynamic and updateable.
# Bad: Calculating in Python and hardcoding result
total = df['Sales'].sum()
sheet['B10'] = total # Hardcodes 5000
# Bad: Computing growth rate in Python
growth = (df.iloc[-1]['Revenue'] - df.iloc[0]['Revenue']) / df.iloc[0]['Revenue']
sheet['C5'] = growth # Hardcodes 0.15
# Bad: Python calculation for average
avg = sum(values) / len(values)
sheet['D20'] = avg # Hardcodes 42.5
# Good: Let Excel calculate the sum
sheet['B10'] = '=SUM(B2:B9)'
# Good: Growth rate as Excel formula
sheet['C5'] = '=(C4-C2)/C2'
# Good: Average using Excel function
sheet['D20'] = '=AVERAGE(D2:D19)'
This applies to ALL calculations - totals, percentages, ratios, differences, etc. The spreadsheet should be able to recalculate when source data changes.
```bash
python scripts/recalc.py output.xlsx
```
status is errors_found, check error_summary for specific error types and locations
#REF!: Invalid cell references
#DIV/0!: Division by zero
#VALUE!: Wrong data type in formula
#NAME?: Unrecognized formula name
# Using openpyxl for formulas and formatting
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
wb = Workbook()
sheet = wb.active
# Add data
sheet['A1'] = 'Hello'
sheet['B1'] = 'World'
sheet.append(['Row', 'of', 'data'])
# Add formula
sheet['B2'] = '=SUM(A1:A10)'
# Formatting
sheet['A1'].font = Font(bold=True, color='FF0000')
sheet['A1'].fill = PatternFill('solid', start_color='FFFF00')
sheet['A1'].alignment = Alignment(horizontal='center')
# Column width
sheet.column_dimensions['A'].width = 20
wb.save('output.xlsx')
# Using openpyxl to preserve formulas and formatting
from openpyxl import load_workbook
# Load existing file
wb = load_workbook('existing.xlsx')
sheet = wb.active # or wb['SheetName'] for specific sheet
# Working with multiple sheets
for sheet_name in wb.sheetnames:
sheet = wb[sheet_name]
print(f"Sheet: {sheet_name}")
# Modify cells
sheet['A1'] = 'New Value'
sheet.insert_rows(2) # Insert row at position 2
sheet.delete_cols(3) # Delete column 3
# Add new sheet
new_sheet = wb.create_sheet('NewSheet')
new_sheet['A1'] = 'Data'
wb.save('modified.xlsx')
Excel files created or modified by openpyxl contain formulas as strings but not calculated values. Use the provided scripts/recalc.py script to recalculate formulas:
python scripts/recalc.py <excel_file> [timeout_seconds]
Example:
python scripts/recalc.py output.xlsx 30
The script:
Quick checks to ensure formulas work correctly:
pd.notna()
/ in formulas (#DIV/0!)
The script returns JSON with error details:
{
"status": "success", // or "errors_found"
"total_errors": 0, // Total error count
"total_formulas": 42, // Number of formulas in file
"error_summary": { // Only present if errors found
"#REF!": {
"count": 2,
"locations": ["Sheet1!B5", "Sheet1!C10"]
}
}
}
data_only=True to read calculated values: load_workbook('file.xlsx', data_only=True)
data_only=True and saved, formulas are replaced with values and permanently lost
read_only=True for reading or write_only=True for writing
pd.read_excel('file.xlsx', dtype={'id': str})
pd.read_excel('file.xlsx', usecols=['A', 'C', 'E'])
pd.read_excel('file.xlsx', parse_dates=['date_column'])
IMPORTANT: When generating Python code for Excel operations:
For Excel files themselves:
共 1 个版本