MRRA LogoMRRA
Resources

Development

Contributing to MRRA and development guidelines

Development

This guide covers development workflows, contribution guidelines, and best practices for contributing to MRRA.

Development Environment Setup

Prerequisites

  • Python 3.10+ (Python 3.11 recommended)
  • Git for version control
  • Virtual environment management (venv, conda, or poetry)

Clone and Setup

# Clone the repository
git clone https://github.com/your-org/mrra.git
cd mrra

# Create virtual environment
python -m venv mrra-dev
source mrra-dev/bin/activate  # On Windows: mrra-dev\Scripts\activate

# Install in development mode
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

# Verify installation
python -c "import mrra; print(f'MRRA version: {mrra.__version__}')"
pytest tests/ -v

Development Dependencies

The development installation includes additional tools:

# Development dependencies (included with pip install -e .[dev])
dev_dependencies = [
    # Testing
    'pytest>=7.0.0',
    'pytest-cov>=4.0.0',
    'pytest-asyncio>=0.21.0',
    'pytest-mock>=3.10.0',
    
    # Code quality
    'black>=23.0.0',
    'isort>=5.12.0',
    'flake8>=6.0.0',
    'mypy>=1.0.0',
    
    # Documentation
    'sphinx>=6.0.0',
    'sphinx-rtd-theme>=1.2.0',
    'myst-parser>=1.0.0',
    
    # Development tools
    'pre-commit>=3.0.0',
    'jupyter>=1.0.0',
    'ipdb>=0.13.0',
]

Project Structure

Understanding the MRRA project structure:

mrra/
├── src/mrra/                    # Main package
│   ├── core/                    # Core types and interfaces
│   │   ├── types.py            # Data types and protocols
│   │   ├── config.py           # Configuration classes
│   │   └── exceptions.py       # Custom exceptions
│   ├── data/                   # Data processing modules
│   │   ├── trajectory.py       # TrajectoryBatch class
│   │   └── activity.py         # Activity extraction
│   ├── analysis/               # Analysis modules
│   │   └── activity_purpose.py # Purpose assignment
│   ├── graph/                  # Graph-related modules
│   │   ├── mobility_graph.py   # MobilityGraph class
│   │   └── pattern.py          # Pattern generation
│   ├── retriever/              # Retrieval systems
│   │   └── graph_rag.py        # GraphRAG implementation
│   ├── agents/                 # Agent systems
│   │   ├── builder.py          # Agent builder
│   │   └── subagents.py        # Sub-agent implementations
│   ├── persist/                # Persistence and caching
│   │   └── cache.py            # Cache manager
│   └── tools/                  # MCP tools and utilities
│       ├── weather.py          # Weather tools
│       └── maps.py             # Maps tools
├── tests/                      # Test suite
│   ├── unit/                   # Unit tests
│   ├── integration/            # Integration tests
│   └── fixtures/               # Test fixtures
├── docs/                       # Documentation
├── scripts/                    # Utility scripts
├── examples/                   # Example code
└── pyproject.toml             # Project configuration

Code Style and Standards

Code Formatting

MRRA uses standardized code formatting tools:

Black for code formatting:

# Format all code
black src/ tests/ scripts/

# Check formatting without changing
black --check src/ tests/ scripts/

Configuration in pyproject.toml:

[tool.black]
line-length = 88
target-version = ['py310']
include = '\.pyi?$'
extend-exclude = '''
/(
    \.eggs
  | \.git
  | \.hg
  | \.mypy_cache
  | \.tox
  | \.venv
  | _build
  | buck-out
  | build
  | dist
)/
'''

isort for import sorting:

# Sort imports
isort src/ tests/ scripts/

# Check import sorting
isort --check-only src/ tests/ scripts/

Configuration in pyproject.toml:

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88
known_first_party = ["mrra"]

flake8 for linting:

# Run linting
flake8 src/ tests/ scripts/

Configuration in .flake8:

[flake8]
max-line-length = 88
extend-ignore = E203, W503, E501
exclude = 
    .git,
    __pycache__,
    .venv,
    build,
    dist

mypy for type checking:

# Type check
mypy src/mrra/

Configuration in pyproject.toml:

[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
disallow_untyped_decorators = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
warn_unreachable = true
strict_equality = true

Pre-commit Hooks

Pre-commit hooks ensure code quality:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
      - id: check-merge-conflict

  - repo: https://github.com/psf/black
    rev: 23.3.0
    hooks:
      - id: black

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort

  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.3.0
    hooks:
      - id: mypy
        additional_dependencies: [types-requests, types-PyYAML]

Testing Framework

Test Structure

MRRA uses pytest with comprehensive test coverage:

# tests/unit/test_trajectory.py
import pytest
import pandas as pd
from mrra.data.trajectory import TrajectoryBatch
from mrra.core.exceptions import ValidationError

class TestTrajectoryBatch:
    """Unit tests for TrajectoryBatch class"""
    
    @pytest.fixture
    def sample_data(self):
        """Sample trajectory data for testing"""
        return pd.DataFrame({
            'user_id': ['user_1', 'user_1', 'user_2', 'user_2'],
            'timestamp': [
                '2024-01-01 08:00:00', '2024-01-01 12:00:00',
                '2024-01-01 09:00:00', '2024-01-01 13:00:00'
            ],
            'latitude': [31.2304, 31.2404, 31.2354, 31.2454],
            'longitude': [121.4737, 121.4837, 121.4787, 121.4887]
        })
    
    def test_init_valid_data(self, sample_data):
        """Test TrajectoryBatch initialization with valid data"""
        tb = TrajectoryBatch(sample_data)
        
        assert len(tb.df) == 4
        assert len(tb.users()) == 2
        assert 'timestamp_local' in tb.df.columns
        assert 'hour' in tb.df.columns
        assert 'dow' in tb.df.columns
    
    def test_init_missing_columns(self):
        """Test TrajectoryBatch with missing required columns"""
        invalid_data = pd.DataFrame({
            'user_id': ['user_1'],
            'latitude': [31.2304],
            # Missing 'longitude' and 'timestamp'
        })
        
        with pytest.raises(ValidationError, match="Missing required columns"):
            TrajectoryBatch(invalid_data)
    
    def test_for_user(self, sample_data):
        """Test user-specific data filtering"""
        tb = TrajectoryBatch(sample_data)
        
        user1_data = tb.for_user('user_1')
        assert len(user1_data) == 2
        assert all(user1_data['user_id'] == 'user_1')
        
        # Test non-existent user
        empty_data = tb.for_user('non_existent')
        assert len(empty_data) == 0
    
    @pytest.mark.parametrize("user_count,expected", [
        (1, 1), (2, 2), (5, 5)
    ])
    def test_users_count(self, user_count, expected):
        """Test user counting with parameterized data"""
        data = pd.DataFrame({
            'user_id': [f'user_{i}' for i in range(user_count)] * 2,
            'timestamp': ['2024-01-01 08:00:00'] * (user_count * 2),
            'latitude': [31.2304] * (user_count * 2),
            'longitude': [121.4737] * (user_count * 2)
        })
        
        tb = TrajectoryBatch(data)
        assert len(tb.users()) == expected

Integration Tests

# tests/integration/test_workflow.py
import pytest
import os
from unittest.mock import Mock, patch
from mrra.data.trajectory import TrajectoryBatch
from mrra.data.activity import ActivityExtractor
from mrra.analysis.activity_purpose import ActivityPurposeAssigner

@pytest.mark.integration
class TestMRRAWorkflow:
    """Integration tests for complete MRRA workflows"""
    
    @pytest.fixture
    def integration_data(self):
        """Larger dataset for integration testing"""
        # Generate more comprehensive test data
        import numpy as np
        
        n_users = 3
        points_per_user = 50
        
        data = []
        base_time = pd.Timestamp('2024-01-01 00:00:00')
        
        for user_i in range(n_users):
            for point_i in range(points_per_user):
                # Simulate realistic movement patterns
                time_offset = pd.Timedelta(hours=point_i * 0.5)
                
                # Base locations with some variation
                base_lat = 31.2304 + user_i * 0.01
                base_lon = 121.4737 + user_i * 0.01
                
                # Add random variation
                lat_var = np.random.normal(0, 0.002)
                lon_var = np.random.normal(0, 0.002)
                
                data.append({
                    'user_id': f'test_user_{user_i}',
                    'timestamp': base_time + time_offset,
                    'latitude': base_lat + lat_var,
                    'longitude': base_lon + lon_var
                })
        
        return pd.DataFrame(data)
    
    @patch('mrra.agents.subagents.make_llm')
    def test_complete_workflow_with_mock(self, mock_make_llm, integration_data):
        """Test complete workflow with mocked LLM"""
        
        # Mock LLM responses
        mock_llm = Mock()
        mock_llm.invoke.side_effect = lambda x: "work" if "morning" in x else "home"
        mock_make_llm.return_value = mock_llm
        
        # Run workflow
        tb = TrajectoryBatch(integration_data)
        acts = ActivityExtractor(tb, radius_m=300, min_dwell_minutes=20).extract()
        
        # Should extract activities
        assert len(acts) > 0
        
        # Mock purpose assignment
        llm_cfg = {"provider": "mock", "model": "test"}
        acts = ActivityPurposeAssigner(tb, llm=mock_llm).assign(acts)
        
        # Verify purposes assigned
        purposes = [getattr(act, 'purpose', None) for act in acts]
        assert all(purpose is not None for purpose in purposes)
        assert mock_llm.invoke.called
    
    @pytest.mark.skipif(
        not os.getenv('MRRA_INTEGRATION_WITH_LLM'), 
        reason="Requires MRRA_INTEGRATION_WITH_LLM=1 and API keys"
    )
    def test_workflow_with_real_llm(self, integration_data):
        """Integration test with real LLM (requires API key)"""
        
        api_key = os.getenv('OPENAI_API_KEY')
        if not api_key:
            pytest.skip("OpenAI API key required")
        
        llm_cfg = {
            'provider': 'openai',
            'model': 'gpt-4o-mini',
            'api_key': api_key,
            'temperature': 0.2
        }
        
        # This test runs the complete workflow with real LLM
        # Keep it simple to avoid excessive API costs
        small_data = integration_data.head(20)  # Use smaller dataset
        
        tb = TrajectoryBatch(small_data)
        acts = ActivityExtractor(tb, radius_m=500, min_dwell_minutes=30).extract()
        
        if len(acts) > 0:
            from mrra.agents.subagents import make_llm
            llm = make_llm(**llm_cfg)
            acts = ActivityPurposeAssigner(tb, llm=llm, concurrency=1).assign(acts)
            
            # Verify real purposes (not just "Other")
            purposes = [getattr(act, 'purpose', 'Other') for act in acts]
            non_other_purposes = [p for p in purposes if p != 'Other']
            
            # At least some activities should have specific purposes
            assert len(non_other_purposes) > 0, f"Expected specific purposes, got: {purposes}"

Running Tests

# Run all tests
pytest

# Run specific test categories
pytest tests/unit/          # Unit tests only
pytest tests/integration/   # Integration tests only

# Run with coverage
pytest --cov=mrra --cov-report=html

# Run integration tests with LLM (requires API keys)
MRRA_INTEGRATION_WITH_LLM=1 OPENAI_API_KEY=your_key pytest tests/integration/

# Run specific test
pytest tests/unit/test_trajectory.py::TestTrajectoryBatch::test_init_valid_data

# Run tests with verbose output
pytest -v -s

# Run tests in parallel (with pytest-xdist)
pip install pytest-xdist
pytest -n auto

Contributing Guidelines

Development Workflow

  1. Fork and Clone

    git clone https://github.com/yourusername/mrra.git
    cd mrra
    git remote add upstream https://github.com/original-org/mrra.git
  2. Create Feature Branch

    git checkout -b feature/your-feature-name
  3. Develop and Test

    # Make changes
    # Add tests
    pytest tests/
    
    # Check code quality
    black src/ tests/
    isort src/ tests/
    flake8 src/ tests/
    mypy src/mrra/
  4. Commit Changes

    git add .
    git commit -m "feat: add new mobility prediction algorithm
    
    - Implement new algorithm for next position prediction
    - Add comprehensive tests and documentation
    - Improve accuracy by 15% on test dataset
    "
  5. Submit Pull Request

    git push origin feature/your-feature-name
    # Create PR through GitHub interface

Commit Message Convention

Use conventional commit format:

type(scope): description

body (optional)

footer (optional)

Types:

  • feat: New feature
  • fix: Bug fix
  • docs: Documentation changes
  • style: Code style changes (formatting, etc.)
  • refactor: Code refactoring
  • test: Adding or updating tests
  • chore: Maintenance tasks

Examples:

feat(agents): add multi-round reflection capability
fix(cache): resolve cache invalidation issue
docs(api): update docstrings for TrajectoryBatch
test(integration): add comprehensive workflow tests
refactor(graph): optimize graph construction performance

Code Review Process

Review Checklist:

  • Code follows style guidelines (black, isort, flake8)
  • All tests pass (unit and integration)
  • New features include comprehensive tests
  • Documentation is updated for API changes
  • Type hints are provided for public APIs
  • Performance impact is considered
  • Backward compatibility is maintained

Documentation Standards

Docstring Format

Use Google-style docstrings:

def predict_next_location(
    user_id: str, 
    current_time: datetime, 
    context: Optional[Dict[str, Any]] = None
) -> PredictionResult:
    """Predict the next location for a user.
    
    Args:
        user_id: Unique identifier for the user
        current_time: Timestamp for prediction context
        context: Additional context for prediction (optional)
    
    Returns:
        PredictionResult containing location prediction and metadata
    
    Raises:
        ValidationError: If user_id is invalid
        PredictionError: If prediction fails
    
    Example:
        >>> predictor = MobilityPredictor(agent)
        >>> result = predictor.predict_next_location(
        ...     user_id="user_123",
        ...     current_time=datetime.now()
        ... )
        >>> print(result.location)
        'g_1234_5678'
    """
    pass

API Documentation

For new modules, include comprehensive module docstrings:

"""Mobility prediction agents and reflection systems.

This module provides the core agent framework for MRRA, including:
- Multi-agent reflection mechanisms
- Sub-agent specializations  
- Aggregation strategies
- MCP tool integration

The main entry point is the `build_mrra_agent` function which constructs
configured agents ready for mobility prediction tasks.

Example:
    Basic agent creation:
    
    >>> from mrra.agents.builder import build_mrra_agent
    >>> agent = build_mrra_agent(
    ...     llm=llm_config,
    ...     retriever=graph_retriever,
    ...     reflection=reflection_config
    ... )
    >>> result = agent.invoke({"task": "next_position", "user_id": "user_123"})

Classes:
    MRRAAgent: Main agent class for predictions
    ReflectionAgent: Multi-agent reflection coordinator
    SubAgent: Individual specialized sub-agent

Functions:
    build_mrra_agent: Main agent factory function
"""

Adding New Features

Feature Development Template

When adding new features, follow this template:

# src/mrra/new_module/feature.py
"""New feature implementation.

This module provides [description of feature].
"""

from typing import Any, Dict, List, Optional, Protocol
import logging
from dataclasses import dataclass

from mrra.core.types import TrajectoryBatch
from mrra.core.exceptions import MRRAError

logger = logging.getLogger(__name__)


@dataclass
class FeatureConfig:
    """Configuration for new feature.
    
    Attributes:
        param1: Description of parameter 1
        param2: Description of parameter 2
    """
    param1: str
    param2: int = 10
    enable_advanced: bool = False


class FeatureError(MRRAError):
    """Raised when feature operations fail."""
    pass


class NewFeature:
    """Implementation of new feature.
    
    This class provides [detailed description].
    
    Example:
        >>> feature = NewFeature(config)
        >>> result = feature.process(data)
    """
    
    def __init__(self, config: FeatureConfig):
        """Initialize feature with configuration.
        
        Args:
            config: Feature configuration object
        """
        self.config = config
        self._validate_config()
    
    def _validate_config(self) -> None:
        """Validate configuration parameters."""
        if self.config.param2 <= 0:
            raise FeatureError("param2 must be positive")
    
    def process(self, data: TrajectoryBatch) -> Dict[str, Any]:
        """Process trajectory data with new feature.
        
        Args:
            data: Trajectory data to process
            
        Returns:
            Dictionary containing processing results
            
        Raises:
            FeatureError: If processing fails
        """
        try:
            logger.info(f"Processing data with {len(data.df)} points")
            
            # Implementation here
            results = self._internal_process(data)
            
            logger.info("Processing completed successfully")
            return results
            
        except Exception as e:
            logger.error(f"Processing failed: {e}")
            raise FeatureError(f"Feature processing failed: {e}") from e
    
    def _internal_process(self, data: TrajectoryBatch) -> Dict[str, Any]:
        """Internal processing logic."""
        # Implementation details
        return {"processed": True}

Adding Tests for New Features

# tests/unit/test_new_feature.py
import pytest
import pandas as pd
from mrra.new_module.feature import NewFeature, FeatureConfig, FeatureError
from mrra.data.trajectory import TrajectoryBatch


class TestNewFeature:
    """Test suite for NewFeature class."""
    
    @pytest.fixture
    def sample_data(self):
        """Sample data for testing."""
        return pd.DataFrame({
            'user_id': ['user_1'] * 4,
            'timestamp': ['2024-01-01 08:00:00', '2024-01-01 09:00:00', 
                         '2024-01-01 10:00:00', '2024-01-01 11:00:00'],
            'latitude': [31.2304, 31.2354, 31.2404, 31.2454],
            'longitude': [121.4737, 121.4787, 121.4837, 121.4887]
        })
    
    @pytest.fixture
    def default_config(self):
        """Default configuration for testing."""
        return FeatureConfig(param1="test", param2=5)
    
    def test_init_valid_config(self, default_config):
        """Test initialization with valid configuration."""
        feature = NewFeature(default_config)
        assert feature.config.param1 == "test"
        assert feature.config.param2 == 5
    
    def test_init_invalid_config(self):
        """Test initialization with invalid configuration."""
        invalid_config = FeatureConfig(param1="test", param2=-1)
        
        with pytest.raises(FeatureError, match="param2 must be positive"):
            NewFeature(invalid_config)
    
    def test_process_valid_data(self, default_config, sample_data):
        """Test processing with valid data."""
        feature = NewFeature(default_config)
        tb = TrajectoryBatch(sample_data)
        
        result = feature.process(tb)
        
        assert isinstance(result, dict)
        assert result.get("processed") is True
    
    def test_process_empty_data(self, default_config):
        """Test processing with empty data."""
        feature = NewFeature(default_config)
        empty_df = pd.DataFrame(columns=['user_id', 'timestamp', 'latitude', 'longitude'])
        tb = TrajectoryBatch(empty_df)
        
        # Should handle gracefully or raise appropriate error
        with pytest.raises(FeatureError):
            feature.process(tb)
    
    @pytest.mark.parametrize("param1,param2,expected", [
        ("test1", 5, True),
        ("test2", 10, True),
        ("test3", 1, True),
    ])
    def test_process_parameterized(self, param1, param2, expected, sample_data):
        """Test processing with different parameter combinations."""
        config = FeatureConfig(param1=param1, param2=param2)
        feature = NewFeature(config)
        tb = TrajectoryBatch(sample_data)
        
        result = feature.process(tb)
        assert result.get("processed") == expected

Performance Optimization

Profiling and Benchmarking

# scripts/benchmark.py
"""Benchmarking script for MRRA performance."""

import time
import cProfile
import pstats
from functools import wraps
from typing import Callable, Any

def profile_function(func: Callable) -> Callable:
    """Decorator to profile function performance."""
    
    @wraps(func)
    def wrapper(*args, **kwargs) -> Any:
        profiler = cProfile.Profile()
        
        start_time = time.time()
        profiler.enable()
        
        try:
            result = func(*args, **kwargs)
        finally:
            profiler.disable()
            
        end_time = time.time()
        
        # Print timing info
        print(f"{func.__name__} took {end_time - start_time:.2f} seconds")
        
        # Print profiling stats
        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(10)  # Top 10 functions
        
        return result
    
    return wrapper

@profile_function
def benchmark_activity_extraction(tb, config):
    """Benchmark activity extraction performance."""
    from mrra.data.activity import ActivityExtractor
    return ActivityExtractor(tb, **config).extract()

@profile_function 
def benchmark_graph_construction(tb, acts, config):
    """Benchmark graph construction performance."""
    from mrra.graph.mobility_graph import MobilityGraph, GraphConfig
    cfg = GraphConfig(**config)
    return MobilityGraph(tb, cfg, activities=acts, assume_purposes_assigned=True)

# Usage
if __name__ == "__main__":
    # Run benchmarks
    pass

Memory Optimization

# Memory optimization utilities
import psutil
import gc
from typing import Any, Dict

class MemoryMonitor:
    """Monitor memory usage during operations."""
    
    def __init__(self):
        self.process = psutil.Process()
        self.initial_memory = self.get_memory_mb()
    
    def get_memory_mb(self) -> float:
        """Get current memory usage in MB."""
        return self.process.memory_info().rss / 1024 / 1024
    
    def report_memory(self, operation: str = "") -> None:
        """Report current memory usage."""
        current = self.get_memory_mb()
        delta = current - self.initial_memory
        print(f"Memory {operation}: {current:.1f}MB (Δ{delta:+.1f}MB)")
    
    def cleanup(self) -> None:
        """Force garbage collection and report memory."""
        before = self.get_memory_mb()
        gc.collect()
        after = self.get_memory_mb()
        freed = before - after
        print(f"Memory cleanup freed {freed:.1f}MB")

# Usage in development
monitor = MemoryMonitor()
monitor.report_memory("start")

# ... operations ...
monitor.report_memory("after processing")
monitor.cleanup()

Performance Guidelines:

  • Profile before optimizing - measure actual bottlenecks
  • Use caching extensively for expensive operations
  • Consider memory usage with large datasets
  • Optimize LLM calls with batching and concurrency
  • Monitor API costs during development
  • Use appropriate data structures (pandas vs lists vs sets)

Release Process

Version Management

MRRA uses semantic versioning (semver):

# src/mrra/__init__.py
__version__ = "0.2.1"

# Major.Minor.Patch
# Major: Breaking changes
# Minor: New features, backward compatible
# Patch: Bug fixes, backward compatible

Release Checklist

  1. Pre-release Testing

    # Run complete test suite
    pytest tests/ -v
    
    # Run integration tests
    MRRA_INTEGRATION_TEST=1 pytest tests/integration/
    
    # Check code quality
    black --check src/ tests/
    flake8 src/ tests/
    mypy src/mrra/
  2. Documentation Updates

    # Update CHANGELOG.md
    # Update version in __init__.py
    # Update README.md if needed
    # Build documentation
    cd docs/
    make html
  3. Release Commit

    git add .
    git commit -m "chore: release v0.2.1"
    git tag -a v0.2.1 -m "Release v0.2.1"
    git push origin main
    git push origin v0.2.1
  4. Package Build

    # Clean previous builds
    rm -rf dist/ build/
    
    # Build packages
    python -m build
    
    # Upload to PyPI (maintainers only)
    twine upload dist/*

Next Steps

  • Review Examples for development patterns
  • Check Configuration for customization options
  • Explore the GitHub repository for latest development activity
  • Join the community discussions for questions and contributions