1st draft with ChatGPT.

This commit is contained in:
Chris Wong 2023-06-25 23:50:17 +08:00
parent 47769ae07e
commit 31530815ed
16 changed files with 74 additions and 1 deletions

4
.gitignore vendored
View File

@ -1,3 +1,7 @@
# data
*.pdf
*.csv
# ---> JupyterNotebooks
# gitignore template for Jupyter Notebooks
# website: http://jupyter.org/

View File

@ -1,2 +1,41 @@
# personal-finance-database
# Personal Finance Database
This project aims to manage personal finance data using a Google Sheets based database. It provides tools to import data from PDF statements, perform basic personal financial analysis, and visualize the data.
## Features
- **Data Ingestion:** Import data from Google Sheets and PDF statements.
- **Data Processing:** Clean and preprocess the data for further analysis.
- **Data Analysis:** Conduct basic personal financial analysis.
- **Data Visualization:** Create simple and understandable visualizations of financial data.
## Installation
Clone this repository to your local machine.
```bash
git clone https://github.com/your-github-username/personal-finance-database.git
```
Navigate to the project directory.
```bash
cd personal-finance-database
```
Install the necessary packages.
```bash
pip install -r requirements.txt
```
## Usage
[Provide instructions on how to use the project. This should include code examples and explanations of the different components.]
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## License
[Choose an open source license and mention it here.]
## Contact
[Your Name] - [Your Email] - [Your LinkedIn/GitHub/Twitter etc.]
Remember to replace the placeholders with your actual details. You should also include a more detailed explanation in the "Usage" section once you have more functionality built out.

0
requirements.txt Normal file
View File

0
src/__init__.py Normal file
View File

View File

View File

View File

View File

@ -0,0 +1,14 @@
from pdfminer.high_level import extract_text
import tabula
class PdfParser:
def __init__(self, file_path):
self.file_path = file_path
def extract_text(self):
text = extract_text(self.file_path)
return text
def extract_table(self):
tables = tabula.read_pdf(self.file_path, pages='all')
return tables

View File

View File

0
tests/__init__.py Normal file
View File

View File

View File

@ -0,0 +1,16 @@
import pytest
from pdf_parser import PdfParser
def test_pdf_parser_text_extraction():
pdf_parser = PdfParser('path_to_test_pdf')
text = pdf_parser.extract_text()
assert isinstance(text, str)
assert len(text) > 0
def test_pdf_parser_table_extraction():
pdf_parser = PdfParser('path_to_test_pdf')
tables = pdf_parser.extract_table()
assert isinstance(tables, list)
assert all(isinstance(table, pd.DataFrame) for table in tables)

View File