11. Troubleshooting

This section covers common issues you might encounter when using VectorETL and how to resolve them.

Common Issues and Solutions

  1. Configuration File Not Found

    Error: FileNotFoundError: [Errno 2] No such file or directory: 'config.yaml'

    Solution: Ensure the path to your configuration file is correct and the file exists.

  2. API Key Issues

    Error: AuthenticationError: Incorrect API key provided

    Solution: Double-check your API keys in the configuration file. Ensure they are correctly set and valid.

  3. Memory Issues

    Error: MemoryError or process killed

    Solution: Reduce the batch size in your configuration. If using large language models, consider using a machine with more RAM.

  4. Database Connection Issues

    Error: OperationalError: Unable to connect to database

    Solution: Verify your database credentials and ensure the database is accessible from your network.

  5. Embedding Model Errors

    Error: InvalidRequestError: The model does not exist

    Solution: Check that you’ve specified a valid model name in your configuration.

Debugging Tips

  1. Enable Verbose Logging: Set the logging level to DEBUG for more detailed output:

    import logging
    logging.basicConfig(level=logging.DEBUG)
    
  2. Use Print Statements: Temporarily add print statements to track the flow of your ETL process.

  3. Check Intermediate Data: Save intermediate DataFrames to CSV files to inspect the data at different stages of the pipeline.

  4. Isolate Components: Test each component (source, embedding, target) separately to identify where issues are occurring.

  5. Version Check: Ensure you’re using compatible versions of VectorETL and its dependencies.

Performance Optimization

If your ETL process is running slowly:

  1. Increase the batch size if memory allows.

  2. Use a more powerful machine for computation-heavy tasks like embedding.

  3. Optimize your database queries if using a database source.

  4. Consider using a faster embedding model if accuracy requirements allow.

  5. Implement parallel processing for independent tasks.

Getting Help

If you’re still stuck:

  1. Check the project’s GitHub Issues page for similar problems and solutions.

  2. Consult the documentation thoroughly.

  3. Reach out to the community forums or discussion boards.

  4. If you believe you’ve found a bug, report it on the GitHub Issues page with a detailed description and steps to reproduce.

Remember to always provide as much relevant information as possible when seeking help, including your configuration (with sensitive information removed), error messages, and steps to reproduce the issue.