Development
Contributing to MRRA and development guidelines
Development
This guide covers development workflows, contribution guidelines, and best practices for contributing to MRRA.
Development Environment Setup
Prerequisites
- Python 3.10+ (Python 3.11 recommended)
- Git for version control
- Virtual environment management (venv, conda, or poetry)
Clone and Setup
# Clone the repository
git clone https://github.com/your-org/mrra.git
cd mrra
# Create virtual environment
python -m venv mrra-dev
source mrra-dev/bin/activate # On Windows: mrra-dev\Scripts\activate
# Install in development mode
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install
# Verify installation
python -c "import mrra; print(f'MRRA version: {mrra.__version__}')"
pytest tests/ -vDevelopment Dependencies
The development installation includes additional tools:
# Development dependencies (included with pip install -e .[dev])
dev_dependencies = [
# Testing
'pytest>=7.0.0',
'pytest-cov>=4.0.0',
'pytest-asyncio>=0.21.0',
'pytest-mock>=3.10.0',
# Code quality
'black>=23.0.0',
'isort>=5.12.0',
'flake8>=6.0.0',
'mypy>=1.0.0',
# Documentation
'sphinx>=6.0.0',
'sphinx-rtd-theme>=1.2.0',
'myst-parser>=1.0.0',
# Development tools
'pre-commit>=3.0.0',
'jupyter>=1.0.0',
'ipdb>=0.13.0',
]Project Structure
Understanding the MRRA project structure:
mrra/
├── src/mrra/ # Main package
│ ├── core/ # Core types and interfaces
│ │ ├── types.py # Data types and protocols
│ │ ├── config.py # Configuration classes
│ │ └── exceptions.py # Custom exceptions
│ ├── data/ # Data processing modules
│ │ ├── trajectory.py # TrajectoryBatch class
│ │ └── activity.py # Activity extraction
│ ├── analysis/ # Analysis modules
│ │ └── activity_purpose.py # Purpose assignment
│ ├── graph/ # Graph-related modules
│ │ ├── mobility_graph.py # MobilityGraph class
│ │ └── pattern.py # Pattern generation
│ ├── retriever/ # Retrieval systems
│ │ └── graph_rag.py # GraphRAG implementation
│ ├── agents/ # Agent systems
│ │ ├── builder.py # Agent builder
│ │ └── subagents.py # Sub-agent implementations
│ ├── persist/ # Persistence and caching
│ │ └── cache.py # Cache manager
│ └── tools/ # MCP tools and utilities
│ ├── weather.py # Weather tools
│ └── maps.py # Maps tools
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── fixtures/ # Test fixtures
├── docs/ # Documentation
├── scripts/ # Utility scripts
├── examples/ # Example code
└── pyproject.toml # Project configurationCode Style and Standards
Code Formatting
MRRA uses standardized code formatting tools:
Black for code formatting:
# Format all code
black src/ tests/ scripts/
# Check formatting without changing
black --check src/ tests/ scripts/Configuration in pyproject.toml:
[tool.black]
line-length = 88
target-version = ['py310']
include = '\.pyi?$'
extend-exclude = '''
/(
\.eggs
| \.git
| \.hg
| \.mypy_cache
| \.tox
| \.venv
| _build
| buck-out
| build
| dist
)/
'''isort for import sorting:
# Sort imports
isort src/ tests/ scripts/
# Check import sorting
isort --check-only src/ tests/ scripts/Configuration in pyproject.toml:
[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88
known_first_party = ["mrra"]flake8 for linting:
# Run linting
flake8 src/ tests/ scripts/Configuration in .flake8:
[flake8]
max-line-length = 88
extend-ignore = E203, W503, E501
exclude =
.git,
__pycache__,
.venv,
build,
distmypy for type checking:
# Type check
mypy src/mrra/Configuration in pyproject.toml:
[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
warn_unreachable = true
strict_equality = truePre-commit Hooks
Pre-commit hooks ensure code quality:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-merge-conflict
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.3.0
hooks:
- id: mypy
additional_dependencies: [types-requests, types-PyYAML]Testing Framework
Test Structure
MRRA uses pytest with comprehensive test coverage:
# tests/unit/test_trajectory.py
import pytest
import pandas as pd
from mrra.data.trajectory import TrajectoryBatch
from mrra.core.exceptions import ValidationError
class TestTrajectoryBatch:
"""Unit tests for TrajectoryBatch class"""
@pytest.fixture
def sample_data(self):
"""Sample trajectory data for testing"""
return pd.DataFrame({
'user_id': ['user_1', 'user_1', 'user_2', 'user_2'],
'timestamp': [
'2024-01-01 08:00:00', '2024-01-01 12:00:00',
'2024-01-01 09:00:00', '2024-01-01 13:00:00'
],
'latitude': [31.2304, 31.2404, 31.2354, 31.2454],
'longitude': [121.4737, 121.4837, 121.4787, 121.4887]
})
def test_init_valid_data(self, sample_data):
"""Test TrajectoryBatch initialization with valid data"""
tb = TrajectoryBatch(sample_data)
assert len(tb.df) == 4
assert len(tb.users()) == 2
assert 'timestamp_local' in tb.df.columns
assert 'hour' in tb.df.columns
assert 'dow' in tb.df.columns
def test_init_missing_columns(self):
"""Test TrajectoryBatch with missing required columns"""
invalid_data = pd.DataFrame({
'user_id': ['user_1'],
'latitude': [31.2304],
# Missing 'longitude' and 'timestamp'
})
with pytest.raises(ValidationError, match="Missing required columns"):
TrajectoryBatch(invalid_data)
def test_for_user(self, sample_data):
"""Test user-specific data filtering"""
tb = TrajectoryBatch(sample_data)
user1_data = tb.for_user('user_1')
assert len(user1_data) == 2
assert all(user1_data['user_id'] == 'user_1')
# Test non-existent user
empty_data = tb.for_user('non_existent')
assert len(empty_data) == 0
@pytest.mark.parametrize("user_count,expected", [
(1, 1), (2, 2), (5, 5)
])
def test_users_count(self, user_count, expected):
"""Test user counting with parameterized data"""
data = pd.DataFrame({
'user_id': [f'user_{i}' for i in range(user_count)] * 2,
'timestamp': ['2024-01-01 08:00:00'] * (user_count * 2),
'latitude': [31.2304] * (user_count * 2),
'longitude': [121.4737] * (user_count * 2)
})
tb = TrajectoryBatch(data)
assert len(tb.users()) == expectedIntegration Tests
# tests/integration/test_workflow.py
import pytest
import os
from unittest.mock import Mock, patch
from mrra.data.trajectory import TrajectoryBatch
from mrra.data.activity import ActivityExtractor
from mrra.analysis.activity_purpose import ActivityPurposeAssigner
@pytest.mark.integration
class TestMRRAWorkflow:
"""Integration tests for complete MRRA workflows"""
@pytest.fixture
def integration_data(self):
"""Larger dataset for integration testing"""
# Generate more comprehensive test data
import numpy as np
n_users = 3
points_per_user = 50
data = []
base_time = pd.Timestamp('2024-01-01 00:00:00')
for user_i in range(n_users):
for point_i in range(points_per_user):
# Simulate realistic movement patterns
time_offset = pd.Timedelta(hours=point_i * 0.5)
# Base locations with some variation
base_lat = 31.2304 + user_i * 0.01
base_lon = 121.4737 + user_i * 0.01
# Add random variation
lat_var = np.random.normal(0, 0.002)
lon_var = np.random.normal(0, 0.002)
data.append({
'user_id': f'test_user_{user_i}',
'timestamp': base_time + time_offset,
'latitude': base_lat + lat_var,
'longitude': base_lon + lon_var
})
return pd.DataFrame(data)
@patch('mrra.agents.subagents.make_llm')
def test_complete_workflow_with_mock(self, mock_make_llm, integration_data):
"""Test complete workflow with mocked LLM"""
# Mock LLM responses
mock_llm = Mock()
mock_llm.invoke.side_effect = lambda x: "work" if "morning" in x else "home"
mock_make_llm.return_value = mock_llm
# Run workflow
tb = TrajectoryBatch(integration_data)
acts = ActivityExtractor(tb, radius_m=300, min_dwell_minutes=20).extract()
# Should extract activities
assert len(acts) > 0
# Mock purpose assignment
llm_cfg = {"provider": "mock", "model": "test"}
acts = ActivityPurposeAssigner(tb, llm=mock_llm).assign(acts)
# Verify purposes assigned
purposes = [getattr(act, 'purpose', None) for act in acts]
assert all(purpose is not None for purpose in purposes)
assert mock_llm.invoke.called
@pytest.mark.skipif(
not os.getenv('MRRA_INTEGRATION_WITH_LLM'),
reason="Requires MRRA_INTEGRATION_WITH_LLM=1 and API keys"
)
def test_workflow_with_real_llm(self, integration_data):
"""Integration test with real LLM (requires API key)"""
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
pytest.skip("OpenAI API key required")
llm_cfg = {
'provider': 'openai',
'model': 'gpt-4o-mini',
'api_key': api_key,
'temperature': 0.2
}
# This test runs the complete workflow with real LLM
# Keep it simple to avoid excessive API costs
small_data = integration_data.head(20) # Use smaller dataset
tb = TrajectoryBatch(small_data)
acts = ActivityExtractor(tb, radius_m=500, min_dwell_minutes=30).extract()
if len(acts) > 0:
from mrra.agents.subagents import make_llm
llm = make_llm(**llm_cfg)
acts = ActivityPurposeAssigner(tb, llm=llm, concurrency=1).assign(acts)
# Verify real purposes (not just "Other")
purposes = [getattr(act, 'purpose', 'Other') for act in acts]
non_other_purposes = [p for p in purposes if p != 'Other']
# At least some activities should have specific purposes
assert len(non_other_purposes) > 0, f"Expected specific purposes, got: {purposes}"Running Tests
# Run all tests
pytest
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests only
# Run with coverage
pytest --cov=mrra --cov-report=html
# Run integration tests with LLM (requires API keys)
MRRA_INTEGRATION_WITH_LLM=1 OPENAI_API_KEY=your_key pytest tests/integration/
# Run specific test
pytest tests/unit/test_trajectory.py::TestTrajectoryBatch::test_init_valid_data
# Run tests with verbose output
pytest -v -s
# Run tests in parallel (with pytest-xdist)
pip install pytest-xdist
pytest -n autoContributing Guidelines
Development Workflow
-
Fork and Clone
git clone https://github.com/yourusername/mrra.git cd mrra git remote add upstream https://github.com/original-org/mrra.git -
Create Feature Branch
git checkout -b feature/your-feature-name -
Develop and Test
# Make changes # Add tests pytest tests/ # Check code quality black src/ tests/ isort src/ tests/ flake8 src/ tests/ mypy src/mrra/ -
Commit Changes
git add . git commit -m "feat: add new mobility prediction algorithm - Implement new algorithm for next position prediction - Add comprehensive tests and documentation - Improve accuracy by 15% on test dataset " -
Submit Pull Request
git push origin feature/your-feature-name # Create PR through GitHub interface
Commit Message Convention
Use conventional commit format:
type(scope): description
body (optional)
footer (optional)Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting, etc.)refactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks
Examples:
feat(agents): add multi-round reflection capability
fix(cache): resolve cache invalidation issue
docs(api): update docstrings for TrajectoryBatch
test(integration): add comprehensive workflow tests
refactor(graph): optimize graph construction performanceCode Review Process
Review Checklist:
- Code follows style guidelines (black, isort, flake8)
- All tests pass (unit and integration)
- New features include comprehensive tests
- Documentation is updated for API changes
- Type hints are provided for public APIs
- Performance impact is considered
- Backward compatibility is maintained
Documentation Standards
Docstring Format
Use Google-style docstrings:
def predict_next_location(
user_id: str,
current_time: datetime,
context: Optional[Dict[str, Any]] = None
) -> PredictionResult:
"""Predict the next location for a user.
Args:
user_id: Unique identifier for the user
current_time: Timestamp for prediction context
context: Additional context for prediction (optional)
Returns:
PredictionResult containing location prediction and metadata
Raises:
ValidationError: If user_id is invalid
PredictionError: If prediction fails
Example:
>>> predictor = MobilityPredictor(agent)
>>> result = predictor.predict_next_location(
... user_id="user_123",
... current_time=datetime.now()
... )
>>> print(result.location)
'g_1234_5678'
"""
passAPI Documentation
For new modules, include comprehensive module docstrings:
"""Mobility prediction agents and reflection systems.
This module provides the core agent framework for MRRA, including:
- Multi-agent reflection mechanisms
- Sub-agent specializations
- Aggregation strategies
- MCP tool integration
The main entry point is the `build_mrra_agent` function which constructs
configured agents ready for mobility prediction tasks.
Example:
Basic agent creation:
>>> from mrra.agents.builder import build_mrra_agent
>>> agent = build_mrra_agent(
... llm=llm_config,
... retriever=graph_retriever,
... reflection=reflection_config
... )
>>> result = agent.invoke({"task": "next_position", "user_id": "user_123"})
Classes:
MRRAAgent: Main agent class for predictions
ReflectionAgent: Multi-agent reflection coordinator
SubAgent: Individual specialized sub-agent
Functions:
build_mrra_agent: Main agent factory function
"""Adding New Features
Feature Development Template
When adding new features, follow this template:
# src/mrra/new_module/feature.py
"""New feature implementation.
This module provides [description of feature].
"""
from typing import Any, Dict, List, Optional, Protocol
import logging
from dataclasses import dataclass
from mrra.core.types import TrajectoryBatch
from mrra.core.exceptions import MRRAError
logger = logging.getLogger(__name__)
@dataclass
class FeatureConfig:
"""Configuration for new feature.
Attributes:
param1: Description of parameter 1
param2: Description of parameter 2
"""
param1: str
param2: int = 10
enable_advanced: bool = False
class FeatureError(MRRAError):
"""Raised when feature operations fail."""
pass
class NewFeature:
"""Implementation of new feature.
This class provides [detailed description].
Example:
>>> feature = NewFeature(config)
>>> result = feature.process(data)
"""
def __init__(self, config: FeatureConfig):
"""Initialize feature with configuration.
Args:
config: Feature configuration object
"""
self.config = config
self._validate_config()
def _validate_config(self) -> None:
"""Validate configuration parameters."""
if self.config.param2 <= 0:
raise FeatureError("param2 must be positive")
def process(self, data: TrajectoryBatch) -> Dict[str, Any]:
"""Process trajectory data with new feature.
Args:
data: Trajectory data to process
Returns:
Dictionary containing processing results
Raises:
FeatureError: If processing fails
"""
try:
logger.info(f"Processing data with {len(data.df)} points")
# Implementation here
results = self._internal_process(data)
logger.info("Processing completed successfully")
return results
except Exception as e:
logger.error(f"Processing failed: {e}")
raise FeatureError(f"Feature processing failed: {e}") from e
def _internal_process(self, data: TrajectoryBatch) -> Dict[str, Any]:
"""Internal processing logic."""
# Implementation details
return {"processed": True}Adding Tests for New Features
# tests/unit/test_new_feature.py
import pytest
import pandas as pd
from mrra.new_module.feature import NewFeature, FeatureConfig, FeatureError
from mrra.data.trajectory import TrajectoryBatch
class TestNewFeature:
"""Test suite for NewFeature class."""
@pytest.fixture
def sample_data(self):
"""Sample data for testing."""
return pd.DataFrame({
'user_id': ['user_1'] * 4,
'timestamp': ['2024-01-01 08:00:00', '2024-01-01 09:00:00',
'2024-01-01 10:00:00', '2024-01-01 11:00:00'],
'latitude': [31.2304, 31.2354, 31.2404, 31.2454],
'longitude': [121.4737, 121.4787, 121.4837, 121.4887]
})
@pytest.fixture
def default_config(self):
"""Default configuration for testing."""
return FeatureConfig(param1="test", param2=5)
def test_init_valid_config(self, default_config):
"""Test initialization with valid configuration."""
feature = NewFeature(default_config)
assert feature.config.param1 == "test"
assert feature.config.param2 == 5
def test_init_invalid_config(self):
"""Test initialization with invalid configuration."""
invalid_config = FeatureConfig(param1="test", param2=-1)
with pytest.raises(FeatureError, match="param2 must be positive"):
NewFeature(invalid_config)
def test_process_valid_data(self, default_config, sample_data):
"""Test processing with valid data."""
feature = NewFeature(default_config)
tb = TrajectoryBatch(sample_data)
result = feature.process(tb)
assert isinstance(result, dict)
assert result.get("processed") is True
def test_process_empty_data(self, default_config):
"""Test processing with empty data."""
feature = NewFeature(default_config)
empty_df = pd.DataFrame(columns=['user_id', 'timestamp', 'latitude', 'longitude'])
tb = TrajectoryBatch(empty_df)
# Should handle gracefully or raise appropriate error
with pytest.raises(FeatureError):
feature.process(tb)
@pytest.mark.parametrize("param1,param2,expected", [
("test1", 5, True),
("test2", 10, True),
("test3", 1, True),
])
def test_process_parameterized(self, param1, param2, expected, sample_data):
"""Test processing with different parameter combinations."""
config = FeatureConfig(param1=param1, param2=param2)
feature = NewFeature(config)
tb = TrajectoryBatch(sample_data)
result = feature.process(tb)
assert result.get("processed") == expectedPerformance Optimization
Profiling and Benchmarking
# scripts/benchmark.py
"""Benchmarking script for MRRA performance."""
import time
import cProfile
import pstats
from functools import wraps
from typing import Callable, Any
def profile_function(func: Callable) -> Callable:
"""Decorator to profile function performance."""
@wraps(func)
def wrapper(*args, **kwargs) -> Any:
profiler = cProfile.Profile()
start_time = time.time()
profiler.enable()
try:
result = func(*args, **kwargs)
finally:
profiler.disable()
end_time = time.time()
# Print timing info
print(f"{func.__name__} took {end_time - start_time:.2f} seconds")
# Print profiling stats
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10) # Top 10 functions
return result
return wrapper
@profile_function
def benchmark_activity_extraction(tb, config):
"""Benchmark activity extraction performance."""
from mrra.data.activity import ActivityExtractor
return ActivityExtractor(tb, **config).extract()
@profile_function
def benchmark_graph_construction(tb, acts, config):
"""Benchmark graph construction performance."""
from mrra.graph.mobility_graph import MobilityGraph, GraphConfig
cfg = GraphConfig(**config)
return MobilityGraph(tb, cfg, activities=acts, assume_purposes_assigned=True)
# Usage
if __name__ == "__main__":
# Run benchmarks
passMemory Optimization
# Memory optimization utilities
import psutil
import gc
from typing import Any, Dict
class MemoryMonitor:
"""Monitor memory usage during operations."""
def __init__(self):
self.process = psutil.Process()
self.initial_memory = self.get_memory_mb()
def get_memory_mb(self) -> float:
"""Get current memory usage in MB."""
return self.process.memory_info().rss / 1024 / 1024
def report_memory(self, operation: str = "") -> None:
"""Report current memory usage."""
current = self.get_memory_mb()
delta = current - self.initial_memory
print(f"Memory {operation}: {current:.1f}MB (Δ{delta:+.1f}MB)")
def cleanup(self) -> None:
"""Force garbage collection and report memory."""
before = self.get_memory_mb()
gc.collect()
after = self.get_memory_mb()
freed = before - after
print(f"Memory cleanup freed {freed:.1f}MB")
# Usage in development
monitor = MemoryMonitor()
monitor.report_memory("start")
# ... operations ...
monitor.report_memory("after processing")
monitor.cleanup()Performance Guidelines:
- Profile before optimizing - measure actual bottlenecks
- Use caching extensively for expensive operations
- Consider memory usage with large datasets
- Optimize LLM calls with batching and concurrency
- Monitor API costs during development
- Use appropriate data structures (pandas vs lists vs sets)
Release Process
Version Management
MRRA uses semantic versioning (semver):
# src/mrra/__init__.py
__version__ = "0.2.1"
# Major.Minor.Patch
# Major: Breaking changes
# Minor: New features, backward compatible
# Patch: Bug fixes, backward compatibleRelease Checklist
-
Pre-release Testing
# Run complete test suite pytest tests/ -v # Run integration tests MRRA_INTEGRATION_TEST=1 pytest tests/integration/ # Check code quality black --check src/ tests/ flake8 src/ tests/ mypy src/mrra/ -
Documentation Updates
# Update CHANGELOG.md # Update version in __init__.py # Update README.md if needed # Build documentation cd docs/ make html -
Release Commit
git add . git commit -m "chore: release v0.2.1" git tag -a v0.2.1 -m "Release v0.2.1" git push origin main git push origin v0.2.1 -
Package Build
# Clean previous builds rm -rf dist/ build/ # Build packages python -m build # Upload to PyPI (maintainers only) twine upload dist/*
Next Steps
- Review Examples for development patterns
- Check Configuration for customization options
- Explore the GitHub repository for latest development activity
- Join the community discussions for questions and contributions