CS292C-10: Computer-Aided Reasoning for Software

CS292C-10: Computer-Aided Reasoning for Software
Lecture 10: The DPLL(T) Framework: SMT Solver Architecture
YF
by Yu Feng
 
Recapping the Nelson–Oppen Method
An approach for combining decision procedures for different theories.
Core Concept
Enables the combination of decision procedures for disjoint theories into a decision procedure for their union.
Key Requirements
Works when theories are signature-disjoint, stably-infinite, and each is decidable.
Mechanism
Theory solvers communicate by exchanging equalities and disequalities over shared variables.
Procedure
Purifies formulas, solves each theory separately, and propagates equalities until reaching a fixed point or finding a conflict.
Nelson-Oppen provides a theoretical foundation for solving combined theories, but has efficiency limitations addressed by modern SMT approaches.
Recapping the Nelson–Oppen Method
1
SMT formula F
Starting point
2
Purification
Split into F₁ (T₁) ∧ F₂ (T₂)
3
Equality Propagation
T₁ Solver ⇄ T₂ Solver
4
Result
SAT or UNSAT
Only handle formula in CNF:
F1 ∧ F2 ∧ … ∧ Fn
Beyond Nelson-Oppen
1
2
3
1
Nelson-Oppen Limitation
Handles only pure conjunctions
2
Program Reality
Arbitrary Boolean combinations
3
DPLL(T) Solution
Handles Boolean structure + theory reasoning
Boolean Abstraction
Grammar of SMT formula in theory T: F := {a_T} | F1 ∧ F2 | F1 ∨ F2 | ¬F
Boolean Abstraction Function
Maps SMT formula to overapproximate SAT formula:
B({a_T} ) = b (b fresh)
B(F1 ∧ F2) = B(F1) ∧ B(F2)
B(F1 ∨ F2) = B(F1) ∨ B(F2)
B(¬F) = ¬B(F1)
Example
F : x = z ∧ ((y = z ∧ x < z ) ∨ ¬(x = z ))
Converted to: B(F) = b1 ∧ ((b2 ∧ b3) ∨ ¬b1)
Is B(F) satisfiable?
Is F satisfiable?
Boolean Abstraction Inconsistency
The boolean abstraction function B(F) creates an overapproximation of the original SMT formula F.
Key Issue
While B(F) may be satisfiable, the original formula F might be unsatisfiable due to theory inconsistencies.
Theory-specific constraints are lost in the abstraction process.
Example
Consider: F : (x > y) ∧ (y > z) ∧ (z > x)
Abstraction: B(F) = b₁ ∧ b₂ ∧ b₃
B(F) is satisfiable (b₁=b₂=b₃=true), but F is unsatisfiable in the theory of arithmetic.
This inconsistency demonstrates why we need to check theory consistency after finding a satisfying boolean assignment.
Off-line v.s. Online
SAT solver may yield assignments that are not satisfiable modulo theory T because boolean abstraction is an over-approximation
Need to learn theory conflict clauses to refine the abstraction
Off-line (eager) approach
Uses SAT solver as a black box
Generates complete boolean assignments
Theory solver checks assignments afterward
Adds learned clauses back to SAT solver
On-line (lazy) approach
Integrates theory solver with SAT reasoning
Theory checks partial assignments during search
Can detect conflicts earlier
Enables theory-based propagation
The Offline Algorithm
The offline approach treats the SAT solver as a black box, requiring complete boolean assignments before theory verification. This leads to a back-and-forth process:
Abstract
φᵖ ← B(φ)
Solve
Use CDCL on φᵖ
Check
Verify theory consistency of B⁻¹(μᵖ)
Retry
If inconsistent, add ¬μᵖ and repeat
This approach can be inefficient as it might explore many boolean assignments that are theory-inconsistent before finding a valid solution. Each time a theory inconsistency is found, we add a conflict clause and restart the SAT solving process.
The Offline Algorithm
The pseudocode for Offline-DPLL(T) implements boolean abstraction with conflict learning:
Offline-DPLL(T-formula φ)
  φᵖ ← B(φ)
  while (TRUE) do
    μᵖ, res ← CDCL(φᵖ)
    if res = UNSAT then return UNSAT
    else
      T-res ← T-solve(B⁻¹(μᵖ))
      if T-res = SAT then return SAT
      else φᵖ ← φᵖ ⋀¬μᵖ
Example
Original formula: F : x = z ∧ ((y = z ∧ x < z ) ∨ ¬(x = z ))
Boolean abstraction: B(F) = b1 ∧ ((b2 ∧ b3) ∨ ¬b1)
SAT solver finds assignment: b1 ∧ b2 ∧ b3
Theory solver: B⁻¹(b1 ∧ b2 ∧ b3) = x = z ∧ y = z ∧ x < z is UNSAT
Add conflict clause: φᵖ ← φᵖ ∧ ¬(b1 ∧ b2 ∧ b3)
New abstraction: b1 ∧ ((b2 ∧ b3) ∨ ¬b1) ∧ ¬(b1 ∧ b2 ∧ b3)
This refined formula remains SAT, so we continue iterating.
B⁻¹(F) = x = y ∧ x < y ∧ a1 ∧ a2 ∧ . . . ∧ a2019
{2^{2019}} UNSAT assignments containing
x = y ∧ x < y but ¬A prevents only one of them
Adding Theory Conflict Clauses
1
Detect Conflict
B⁻¹(μᵖ) is UNSAT
2
Extract Core
Find minimal unsatisfiable subset
3
Add Clause
¬(core) added to clause database
4
Narrow Search
Faster than full ¬μᵖ
Minimal UNSAT Cores
Let φ be original unsatisfiable conjunct
Drop one atom from φ, call this φ'
If φ' is still unsat, φ := φ'
Repeat this for every atom in φ
Resulting φ is minimal unsat core of original formula
Minimal UNSAT Cores Example
Original formula: φ : x = y ∧ f(x)+z = 5 ∧ f(x) ≠ f(y) ∧ y ≤ 3
Drop x = y from φ
Result: f(x)+z = 5 ∧ f(x) ≠ f(y) ∧ y ≤ 3
This is SAT (we can satisfy all constraints)
Drop f(x)+z = 5 from original φ
Result: x = y ∧ f(x) ≠ f(y) ∧ y ≤ 3
This is UNSAT (constraints are contradictory)
New formula: φ : x = y ∧ f(x) ≠ f(y) ∧ y ≤ 3
Drop f(x) ≠ f(y): Result is SAT
Drop y ≤ 3: Result is UNSAT
Therefore, the minimal UNSAT core is x = y ∧ f(x) ≠ f(y)
This represents the smallest subset of constraints that causes unsatisfiability.
Offline DPLL(T) Algorithm (Improved)
Offline-DPLL(T-formula φ)
  φᵖ ← B(φ)
  while (TRUE) do
    μᵖ, res ← CDCL(φᵖ)
    if res = UNSAT then return UNSAT
    else
      T-res ← T-solve(B⁻¹(μᵖ))
      if T-res = SAT then return SAT
      else
        t ← B(UNSATCORE(B⁻¹(μᵖ)))
        φᵖ ← φᵖ ⋀ ¬t
Consider when B⁻¹(F) = x = y ∧ x < y ∧ a₁ ∧ a₂ ∧ ... ∧ a₂₀₁₉
When theory atoms x = y and x < y are overapproximated by boolean variables b₁ and b₂:
Theoretical explosion when both b₁ and b₂ are true
Conflict clause ¬(b₁ ∧ b₂) prevents only one of 2²⁰¹⁹ UNSAT assignments
Algorithm potentially explores exponential space before finding solution
Better but still need a full assignment to the boolean abstraction in order to generate a conflict clause.
From DPLL to DPLL(T)
SAT solver has made assignment in Decide step and performed BCP
If no conflict detected, immediately invoke theory solver
Suppose A is current partial assignment to boolean abstraction
Use theory solver to decide if B⁻¹(A) is UNSAT
If B⁻¹(A) UNSAT, add theory conflict clause ¬A to clause database
Theory Propagation Lemmas
Suppose a formula contains x = y, y = z, x < z with corresponding boolean variables b1, b2, b3
Suppose SAT solver makes partial assignment b1: ⟙, b2: ⟙
In next Decide step, free to assign b3: ⟙ or b3: ⟘
But assignment b3: ⟙ is stupid, why?
Theory Propagation Lemmas
Idea: Theory solver can communicate which literals are implied by current partial assignment
In our example, ¬(x < z) implied by current partial assignment x = y ∧ y = z
Thus, safely add b1 ∧ b2 → ¬b3 to clause database
The clauses implied by theory are theory propagation lemmas
The lemmas prevents bad assignments to boolean abstraction
DPLL-Based SAT Solver
Classic DPLL loop with integrated theory reasoning:
DPLL(T)(Formula φ, Assignment μ)
  if all variables assigned in μ then
    if T-Consistent(μ) then return SAT
    else learn conflict clause and backtrack
  else if φ contains empty clause then
    learn conflict clause and backtrack
  else
    var ← PickBranchLiteral()
    result ← DPLL(T)(φ, μ ∪ {var})
    if result = UNSAT then
      return DPLL(T)(φ, μ ∪ {¬var})
    else
      return result
  // Theory checks happen during propagation
Theory solver integration points: variable selection, constraint propagation, conflict analysis, and final model verification.
Theory Propagation Lemmas
Which theory propagation lemmas do we add?
Option #1 (exhaustive)
Figure out and add all literals implied by current partial assignment
Option #2 (heuristics)
Only figure out literals "obviously" implied by current partial assignment
Exhaustive theory propagation can be very expensive
There isn’t much of a science behind which literals are “obviously” implied
Therefore, most SMT solvers use heuristic-based approaches
Early Theory Conflict Detection
Immediate Check
Don't wait for full assignment
Partial Detection
Partial assignment may already be theory-UNSAT
Quick Response
Add theory conflict clause immediately
Why On-line Wins
Immediate Awareness
Theory conflicts detected early
Efficient Learning
Better clause learning
Real-time Propagation
Avoid exploring invalid paths
Industry Standard
Used in Z3, CVC5, Yices, MathSAT
Key Concepts Recap
Boolean Abstraction
Map theory to SAT
DPLL(T) Variants
Online vs offline approaches
Theory Propagation
Avoid invalid search paths
SMT Formula
SAT + Theories + Learning
Modern SMT Solvers
All competitive SMT solvers today are based on the on-line version
Many existing off-the-shelf SMT solvers: Z3, CVC4, Yices, MathSAT, etc.
Lots of on-going research on SMT, especially related to quantifier support
Annual competition SMT-COM
What's Next?
1
Transition from Theory to Applications
Synthesis, bug findings, and other practical applications
2
Midterm
Coming up next Tuesday
3
Final Projects
Begin working on your course projects