Development

This guide covers development workflows, contribution guidelines, and best practices for contributing to MRRA.

Development Environment Setup

Prerequisites

Python 3.10+ (Python 3.11 recommended)
Git for version control
Virtual environment management (venv, conda, or poetry)

Clone and Setup

# Clone the repository
git clone https://github.com/your-org/mrra.git
cd mrra

# Create virtual environment
python -m venv mrra-dev
source mrra-dev/bin/activate  # On Windows: mrra-dev\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

# Verify installation
python -c "import mrra; print(f'MRRA version: {mrra.__version__}')"
pytest tests/ -v

Development Dependencies

The development installation includes additional tools:

# Development dependencies (included with pip install -e .[dev])
dev_dependencies = [
    # Testing
    'pytest>=7.0.0',
    'pytest-cov>=4.0.0',
    'pytest-asyncio>=0.21.0',
    'pytest-mock>=3.10.0',
    
    # Code quality
    'black>=23.0.0',
    'isort>=5.12.0',
    'flake8>=6.0.0',
    'mypy>=1.0.0',
    
    # Documentation
    'sphinx>=6.0.0',
    'sphinx-rtd-theme>=1.2.0',
    'myst-parser>=1.0.0',
    
    # Development tools
    'pre-commit>=3.0.0',
    'jupyter>=1.0.0',
    'ipdb>=0.13.0',
]

Project Structure

Understanding the MRRA project structure:

mrra/
├── src/mrra/                    # Main package
│   ├── core/                    # Core types and interfaces
│   │   ├── types.py            # Data types and protocols
│   │   ├── config.py           # Configuration classes
│   │   └── exceptions.py       # Custom exceptions
│   ├── data/                   # Data processing modules
│   │   ├── trajectory.py       # TrajectoryBatch class
│   │   └── activity.py         # Activity extraction
│   ├── analysis/               # Analysis modules
│   │   └── activity_purpose.py # Purpose assignment
│   ├── graph/                  # Graph-related modules
│   │   ├── mobility_graph.py   # MobilityGraph class
│   │   └── pattern.py          # Pattern generation
│   ├── retriever/              # Retrieval systems
│   │   └── graph_rag.py        # GraphRAG implementation
│   ├── agents/                 # Agent systems
│   │   ├── builder.py          # Agent builder
│   │   └── subagents.py        # Sub-agent implementations
│   ├── persist/                # Persistence and caching
│   │   └── cache.py            # Cache manager
│   └── tools/                  # MCP tools and utilities
│       ├── weather.py          # Weather tools
│       └── maps.py             # Maps tools
├── tests/                      # Test suite
│   ├── unit/                   # Unit tests
│   ├── integration/            # Integration tests
│   └── fixtures/               # Test fixtures
├── docs/                       # Documentation
├── scripts/                    # Utility scripts
├── examples/                   # Example code
└── pyproject.toml             # Project configuration

Code Style and Standards

Code Formatting

MRRA uses standardized code formatting tools:

Black for code formatting:

# Format all code
black src/ tests/ scripts/

# Check formatting without changing
black --check src/ tests/ scripts/

Configuration in pyproject.toml:

[tool.black]
line-length = 88
target-version = ['py310']
include = '\.pyi?$'
extend-exclude = '''
/(
    \.eggs
  | \.git
  | \.hg
  | \.mypy_cache
  | \.tox
  | \.venv
  | _build
  | buck-out
  | build
  | dist
)/
'''

isort for import sorting:

# Sort imports
isort src/ tests/ scripts/

# Check import sorting
isort --check-only src/ tests/ scripts/

Configuration in pyproject.toml:

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88
known_first_party = ["mrra"]

flake8 for linting:

# Run linting
flake8 src/ tests/ scripts/

Configuration in .flake8:

[flake8]
max-line-length = 88
extend-ignore = E203, W503, E501
exclude = 
    .git,
    __pycache__,
    .venv,
    build,
    dist

mypy for type checking:

# Type check
mypy src/mrra/

Configuration in pyproject.toml:

[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
warn_unreachable = true
strict_equality = true

Pre-commit Hooks

Pre-commit hooks ensure code quality:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
      - id: check-merge-conflict

  - repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
      - id: black

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort

  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.3.0
    hooks:
      - id: mypy
        additional_dependencies: [types-requests, types-PyYAML]

Testing Framework

Test Structure

MRRA uses pytest with comprehensive test coverage:

# tests/unit/test_trajectory.py
import pytest
import pandas as pd
from mrra.data.trajectory import TrajectoryBatch
from mrra.core.exceptions import ValidationError

class TestTrajectoryBatch:
    """Unit tests for TrajectoryBatch class"""
    
    @pytest.fixture
    def sample_data(self):
        """Sample trajectory data for testing"""
        return pd.DataFrame({
            'user_id': ['user_1', 'user_1', 'user_2', 'user_2'],
            'timestamp': [
                '2024-01-01 08:00:00', '2024-01-01 12:00:00',
                '2024-01-01 09:00:00', '2024-01-01 13:00:00'
            ],
            'latitude': [31.2304, 31.2404, 31.2354, 31.2454],
            'longitude': [121.4737, 121.4837, 121.4787, 121.4887]
        })
    
    def test_init_valid_data(self, sample_data):
        """Test TrajectoryBatch initialization with valid data"""
        tb = TrajectoryBatch(sample_data)
        
        assert len(tb.df) == 4
        assert len(tb.users()) == 2
        assert 'timestamp_local' in tb.df.columns
        assert 'hour' in tb.df.columns
        assert 'dow' in tb.df.columns
    
    def test_init_missing_columns(self):
        """Test TrajectoryBatch with missing required columns"""
        invalid_data = pd.DataFrame({
            'user_id': ['user_1'],
            'latitude': [31.2304],
            # Missing 'longitude' and 'timestamp'
        })
        
        with pytest.raises(ValidationError, match="Missing required columns"):
            TrajectoryBatch(invalid_data)
    
    def test_for_user(self, sample_data):
        """Test user-specific data filtering"""
        tb = TrajectoryBatch(sample_data)
        
        user1_data = tb.for_user('user_1')
        assert len(user1_data) == 2
        assert all(user1_data['user_id'] == 'user_1')
        
        # Test non-existent user
        empty_data = tb.for_user('non_existent')
        assert len(empty_data) == 0
    
    @pytest.mark.parametrize("user_count,expected", [
        (1, 1), (2, 2), (5, 5)
    ])
    def test_users_count(self, user_count, expected):
        """Test user counting with parameterized data"""
        data = pd.DataFrame({
            'user_id': [f'user_{i}' for i in range(user_count)] * 2,
            'timestamp': ['2024-01-01 08:00:00'] * (user_count * 2),
            'latitude': [31.2304] * (user_count * 2),
            'longitude': [121.4737] * (user_count * 2)
        })
        
        tb = TrajectoryBatch(data)
        assert len(tb.users()) == expected

Integration Tests

# tests/integration/test_workflow.py
import pytest
import os
from unittest.mock import Mock, patch
from mrra.data.trajectory import TrajectoryBatch
from mrra.data.activity import ActivityExtractor
from mrra.analysis.activity_purpose import ActivityPurposeAssigner

@pytest.mark.integration
class TestMRRAWorkflow:
    """Integration tests for complete MRRA workflows"""
    
    @pytest.fixture
    def integration_data(self):
        """Larger dataset for integration testing"""
        # Generate more comprehensive test data
        import numpy as np
        
        n_users = 3
        points_per_user = 50
        
        data = []
        base_time = pd.Timestamp('2024-01-01 00:00:00')
        
        for user_i in range(n_users):
            for point_i in range(points_per_user):
                # Simulate realistic movement patterns
                time_offset = pd.Timedelta(hours=point_i * 0.5)
                
                # Base locations with some variation
                base_lat = 31.2304 + user_i * 0.01
                base_lon = 121.4737 + user_i * 0.01
                
                # Add random variation
                lat_var = np.random.normal(0, 0.002)
                lon_var = np.random.normal(0, 0.002)
                
                data.append({
                    'user_id': f'test_user_{user_i}',
                    'timestamp': base_time + time_offset,
                    'latitude': base_lat + lat_var,
                    'longitude': base_lon + lon_var
                })
        
        return pd.DataFrame(data)
    
    @patch('mrra.agents.subagents.make_llm')
    def test_complete_workflow_with_mock(self, mock_make_llm, integration_data):
        """Test complete workflow with mocked LLM"""
        
        # Mock LLM responses
        mock_llm = Mock()
        mock_llm.invoke.side_effect = lambda x: "work" if "morning" in x else "home"
        mock_make_llm.return_value = mock_llm
        
        # Run workflow
        tb = TrajectoryBatch(integration_data)
        acts = ActivityExtractor(tb, radius_m=300, min_dwell_minutes=20).extract()
        
        # Should extract activities
        assert len(acts) > 0
        
        # Mock purpose assignment
        llm_cfg = {"provider": "mock", "model": "test"}
        acts = ActivityPurposeAssigner(tb, llm=mock_llm).assign(acts)
        
        # Verify purposes assigned
        purposes = [getattr(act, 'purpose', None) for act in acts]
        assert all(purpose is not None for purpose in purposes)
        assert mock_llm.invoke.called
    
    @pytest.mark.skipif(
        not os.getenv('MRRA_INTEGRATION_WITH_LLM'), 
        reason="Requires MRRA_INTEGRATION_WITH_LLM=1 and API keys"
    )
    def test_workflow_with_real_llm(self, integration_data):
        """Integration test with real LLM (requires API key)"""
        
        api_key = os.getenv('OPENAI_API_KEY')
        if not api_key:
            pytest.skip("OpenAI API key required")
        
        llm_cfg = {
            'provider': 'openai',
            'model': 'gpt-4o-mini',
            'api_key': api_key,
            'temperature': 0.2
        }
        
        # This test runs the complete workflow with real LLM
        # Keep it simple to avoid excessive API costs
        small_data = integration_data.head(20)  # Use smaller dataset
        
        tb = TrajectoryBatch(small_data)
        acts = ActivityExtractor(tb, radius_m=500, min_dwell_minutes=30).extract()
        
        if len(acts) > 0:
            from mrra.agents.subagents import make_llm
            llm = make_llm(**llm_cfg)
            acts = ActivityPurposeAssigner(tb, llm=llm, concurrency=1).assign(acts)
            
            # Verify real purposes (not just "Other")
            purposes = [getattr(act, 'purpose', 'Other') for act in acts]
            non_other_purposes = [p for p in purposes if p != 'Other']
            
            # At least some activities should have specific purposes
            assert len(non_other_purposes) > 0, f"Expected specific purposes, got: {purposes}"

Running Tests

# Run all tests
pytest

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

# Run with coverage
pytest --cov=mrra --cov-report=html

# Run integration tests with LLM (requires API keys)
MRRA_INTEGRATION_WITH_LLM=1 OPENAI_API_KEY=your_key pytest tests/integration/

# Run specific test
pytest tests/unit/test_trajectory.py::TestTrajectoryBatch::test_init_valid_data

# Run tests with verbose output
pytest -v -s

# Run tests in parallel (with pytest-xdist)
pip install pytest-xdist
pytest -n auto

Contributing Guidelines

Development Workflow

Fork and Clone

git clone https://github.com/yourusername/mrra.git
cd mrra
git remote add upstream https://github.com/original-org/mrra.git

Create Feature Branch

git checkout -b feature/your-feature-name

Develop and Test

# Make changes
# Add tests
pytest tests/

# Check code quality
black src/ tests/
isort src/ tests/
flake8 src/ tests/
mypy src/mrra/

Commit Changes

git add .
git commit -m "feat: add new mobility prediction algorithm

- Implement new algorithm for next position prediction
- Add comprehensive tests and documentation
- Improve accuracy by 15% on test dataset
"

Submit Pull Request

git push origin feature/your-feature-name
# Create PR through GitHub interface

Commit Message Convention

Use conventional commit format:

type(scope): description

body (optional)

footer (optional)

Types:

feat: New feature
fix: Bug fix
docs: Documentation changes
style: Code style changes (formatting, etc.)
refactor: Code refactoring
test: Adding or updating tests
chore: Maintenance tasks

Examples:

feat(agents): add multi-round reflection capability
fix(cache): resolve cache invalidation issue
docs(api): update docstrings for TrajectoryBatch
test(integration): add comprehensive workflow tests
refactor(graph): optimize graph construction performance

Code Review Process

Review Checklist:

Code follows style guidelines (black, isort, flake8)
All tests pass (unit and integration)
New features include comprehensive tests
Documentation is updated for API changes
Type hints are provided for public APIs
Performance impact is considered
Backward compatibility is maintained

Documentation Standards

Docstring Format

Use Google-style docstrings:

def predict_next_location(
    user_id: str, 
    current_time: datetime, 
    context: Optional[Dict[str, Any]] = None
) -> PredictionResult:
    """Predict the next location for a user.
    
    Args:
        user_id: Unique identifier for the user
        current_time: Timestamp for prediction context
        context: Additional context for prediction (optional)
    
    Returns:
        PredictionResult containing location prediction and metadata
    
    Raises:
        ValidationError: If user_id is invalid
        PredictionError: If prediction fails
    
    Example:
        >>> predictor = MobilityPredictor(agent)
        >>> result = predictor.predict_next_location(
        ...     user_id="user_123",
        ...     current_time=datetime.now()
        ... )
        >>> print(result.location)
        'g_1234_5678'
    """
    pass

API Documentation

For new modules, include comprehensive module docstrings:

"""Mobility prediction agents and reflection systems.

This module provides the core agent framework for MRRA, including:
- Multi-agent reflection mechanisms
- Sub-agent specializations  
- Aggregation strategies
- MCP tool integration

The main entry point is the `build_mrra_agent` function which constructs
configured agents ready for mobility prediction tasks.

Example:
    Basic agent creation:
    
    >>> from mrra.agents.builder import build_mrra_agent
    >>> agent = build_mrra_agent(
    ...     llm=llm_config,
    ...     retriever=graph_retriever,
    ...     reflection=reflection_config
    ... )
    >>> result = agent.invoke({"task": "next_position", "user_id": "user_123"})

Classes:
    MRRAAgent: Main agent class for predictions
    ReflectionAgent: Multi-agent reflection coordinator
    SubAgent: Individual specialized sub-agent

Functions:
    build_mrra_agent: Main agent factory function
"""

Adding New Features

Feature Development Template

When adding new features, follow this template:

# src/mrra/new_module/feature.py
"""New feature implementation.

This module provides [description of feature].
"""

from typing import Any, Dict, List, Optional, Protocol
import logging
from dataclasses import dataclass

from mrra.core.types import TrajectoryBatch
from mrra.core.exceptions import MRRAError

logger = logging.getLogger(__name__)


@dataclass
class FeatureConfig:
    """Configuration for new feature.
    
    Attributes:
        param1: Description of parameter 1
        param2: Description of parameter 2
    """
    param1: str
    param2: int = 10
    enable_advanced: bool = False


class FeatureError(MRRAError):
    """Raised when feature operations fail."""
    pass


class NewFeature:
    """Implementation of new feature.
    
    This class provides [detailed description].
    
    Example:
        >>> feature = NewFeature(config)
        >>> result = feature.process(data)
    """
    
    def __init__(self, config: FeatureConfig):
        """Initialize feature with configuration.
        
        Args:
            config: Feature configuration object
        """
        self.config = config
        self._validate_config()
    
    def _validate_config(self) -> None:
        """Validate configuration parameters."""
        if self.config.param2 <= 0:
            raise FeatureError("param2 must be positive")
    
    def process(self, data: TrajectoryBatch) -> Dict[str, Any]:
        """Process trajectory data with new feature.
        
        Args:
            data: Trajectory data to process
            
        Returns:
            Dictionary containing processing results
            
        Raises:
            FeatureError: If processing fails
        """
        try:
            logger.info(f"Processing data with {len(data.df)} points")
            
            # Implementation here
            results = self._internal_process(data)
            
            logger.info("Processing completed successfully")
            return results
            
        except Exception as e:
            logger.error(f"Processing failed: {e}")
            raise FeatureError(f"Feature processing failed: {e}") from e
    
    def _internal_process(self, data: TrajectoryBatch) -> Dict[str, Any]:
        """Internal processing logic."""
        # Implementation details
        return {"processed": True}

Adding Tests for New Features

# tests/unit/test_new_feature.py
import pytest
import pandas as pd
from mrra.new_module.feature import NewFeature, FeatureConfig, FeatureError
from mrra.data.trajectory import TrajectoryBatch


class TestNewFeature:
    """Test suite for NewFeature class."""
    
    @pytest.fixture
    def sample_data(self):
        """Sample data for testing."""
        return pd.DataFrame({
            'user_id': ['user_1'] * 4,
            'timestamp': ['2024-01-01 08:00:00', '2024-01-01 09:00:00', 
                         '2024-01-01 10:00:00', '2024-01-01 11:00:00'],
            'latitude': [31.2304, 31.2354, 31.2404, 31.2454],
            'longitude': [121.4737, 121.4787, 121.4837, 121.4887]
        })
    
    @pytest.fixture
    def default_config(self):
        """Default configuration for testing."""
        return FeatureConfig(param1="test", param2=5)
    
    def test_init_valid_config(self, default_config):
        """Test initialization with valid configuration."""
        feature = NewFeature(default_config)
        assert feature.config.param1 == "test"
        assert feature.config.param2 == 5
    
    def test_init_invalid_config(self):
        """Test initialization with invalid configuration."""
        invalid_config = FeatureConfig(param1="test", param2=-1)
        
        with pytest.raises(FeatureError, match="param2 must be positive"):
            NewFeature(invalid_config)
    
    def test_process_valid_data(self, default_config, sample_data):
        """Test processing with valid data."""
        feature = NewFeature(default_config)
        tb = TrajectoryBatch(sample_data)
        
        result = feature.process(tb)
        
        assert isinstance(result, dict)
        assert result.get("processed") is True
    
    def test_process_empty_data(self, default_config):
        """Test processing with empty data."""
        feature = NewFeature(default_config)
        empty_df = pd.DataFrame(columns=['user_id', 'timestamp', 'latitude', 'longitude'])
        tb = TrajectoryBatch(empty_df)
        
        # Should handle gracefully or raise appropriate error
        with pytest.raises(FeatureError):
            feature.process(tb)
    
    @pytest.mark.parametrize("param1,param2,expected", [
        ("test1", 5, True),
        ("test2", 10, True),
        ("test3", 1, True),
    ])
    def test_process_parameterized(self, param1, param2, expected, sample_data):
        """Test processing with different parameter combinations."""
        config = FeatureConfig(param1=param1, param2=param2)
        feature = NewFeature(config)
        tb = TrajectoryBatch(sample_data)
        
        result = feature.process(tb)
        assert result.get("processed") == expected

Performance Optimization

Profiling and Benchmarking

# scripts/benchmark.py
"""Benchmarking script for MRRA performance."""

import time
import cProfile
import pstats
from functools import wraps
from typing import Callable, Any

def profile_function(func: Callable) -> Callable:
    """Decorator to profile function performance."""
    
    @wraps(func)
    def wrapper(*args, **kwargs) -> Any:
        profiler = cProfile.Profile()
        
        start_time = time.time()
        profiler.enable()
        
        try:
            result = func(*args, **kwargs)
        finally:
            profiler.disable()
            
        end_time = time.time()
        
        # Print timing info
        print(f"{func.__name__} took {end_time - start_time:.2f} seconds")
        
        # Print profiling stats
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)  # Top 10 functions
        
        return result
    
    return wrapper

@profile_function
def benchmark_activity_extraction(tb, config):
    """Benchmark activity extraction performance."""
    from mrra.data.activity import ActivityExtractor
    return ActivityExtractor(tb, **config).extract()

@profile_function 
def benchmark_graph_construction(tb, acts, config):
    """Benchmark graph construction performance."""
    from mrra.graph.mobility_graph import MobilityGraph, GraphConfig
    cfg = GraphConfig(**config)
    return MobilityGraph(tb, cfg, activities=acts, assume_purposes_assigned=True)

# Usage
if __name__ == "__main__":
    # Run benchmarks
    pass

Memory Optimization

# Memory optimization utilities
import psutil
import gc
from typing import Any, Dict

class MemoryMonitor:
    """Monitor memory usage during operations."""
    
    def __init__(self):
        self.process = psutil.Process()
        self.initial_memory = self.get_memory_mb()
    
    def get_memory_mb(self) -> float:
        """Get current memory usage in MB."""
        return self.process.memory_info().rss / 1024 / 1024
    
    def report_memory(self, operation: str = "") -> None:
        """Report current memory usage."""
        current = self.get_memory_mb()
        delta = current - self.initial_memory
        print(f"Memory {operation}: {current:.1f}MB (Δ{delta:+.1f}MB)")
    
    def cleanup(self) -> None:
        """Force garbage collection and report memory."""
        before = self.get_memory_mb()
        gc.collect()
        after = self.get_memory_mb()
        freed = before - after
        print(f"Memory cleanup freed {freed:.1f}MB")

# Usage in development
monitor = MemoryMonitor()
monitor.report_memory("start")

# ... operations ...
monitor.report_memory("after processing")
monitor.cleanup()

Performance Guidelines:

Profile before optimizing - measure actual bottlenecks
Use caching extensively for expensive operations
Consider memory usage with large datasets
Optimize LLM calls with batching and concurrency
Monitor API costs during development
Use appropriate data structures (pandas vs lists vs sets)

Release Process

Version Management

MRRA uses semantic versioning (semver):

# src/mrra/__init__.py
__version__ = "0.2.1"

# Major.Minor.Patch
# Major: Breaking changes
# Minor: New features, backward compatible
# Patch: Bug fixes, backward compatible

Release Checklist

Pre-release Testing

# Run complete test suite
pytest tests/ -v

# Run integration tests
MRRA_INTEGRATION_TEST=1 pytest tests/integration/

# Check code quality
black --check src/ tests/
flake8 src/ tests/
mypy src/mrra/

Documentation Updates

# Update CHANGELOG.md
# Update version in __init__.py
# Update README.md if needed
# Build documentation
cd docs/
make html

Release Commit

git add .
git commit -m "chore: release v0.2.1"
git tag -a v0.2.1 -m "Release v0.2.1"
git push origin main
git push origin v0.2.1

Package Build

# Clean previous builds
rm -rf dist/ build/

# Build packages
python -m build

# Upload to PyPI (maintainers only)
twine upload dist/*

Next Steps

Review Examples for development patterns
Check Configuration for customization options
Explore the GitHub repository for latest development activity
Join the community discussions for questions and contributions

Development

On this page