You have data structures in existing code that are making the code harder to read, harder to change, or actively attracting bugs. The data may be raw primitives with hidden behavior, loosely grouped fields that travel together but have no home object, or collections exposed for direct mutation by callers.
This skill applies when:
The core insight from Fowler: Data items start simple and grow. A telephone number starts as a string, but eventually needs formatting, area code extraction, and validation — it has become a first-class object. The signal is not the complexity of the current data item; it's the behavior that wants to live on it. When you find yourself adding the same behavior to the owner of the primitive rather than to a class that represents the concept, the primitive is overdue for promotion.
Scope boundary: Type code refactorings (Replace Type Code with Class, Replace Type Code with Subclasses, Replace Type Code with State/Strategy) are covered by the sibling skill type-code-refactoring-selector. When a magic number or type code enum is driving switch statement behavior, use that skill. When data merely needs a better structural home, use this one.
code-smell-diagnosis has already been run, use its output to confirm which smell is present. Why: multiple smells can present similarly (Primitive Obsession vs. Data Clumps vs. Data Class). Knowing the smell name from diagnosis avoids picking the wrong refactoring.Scan the code to orient:
Smell signals to look for:
- Fields that are primitives with related behavior scattered on the owner → Primitive Obsession
- Same 2-4 fields appearing together in 3+ places → Data Clumps
- Class with only getters/setters and no behavior methods → Data Class
- public field or collection returned directly → missing Encapsulate Field / Encapsulate Collection
- array[0], array[1], array[2] with comments naming each slot → Replace Array with Object
- Numeric literals (9.81, 0.85, 24) in domain calculations → Replace Magic Number with Symbolic Constant
- Business calculation methods on a class that extends a GUI framework → Duplicate Observed Data needed
- One class holds a reference to another but the referenced class cannot navigate back → unidirectional association
Start here. Match the presenting symptom to the correct refactoring before executing mechanics.
SYMPTOM → REFACTORING
A primitive field has behavior that keeps growing on the owner class
→ Replace Data Value with Object
Then decide: does the resulting object have identity (real-world entity)?
YES → Change Value to Reference
NO → keep as value object (immutable, equality by value)
A field is accessed directly within the same class, and you need subclass
override flexibility or lazy initialization
→ Self Encapsulate Field
A public field is accessed by external classes
→ Encapsulate Field
(first step only — a class with just accessors is still a Data Class;
follow with Move Method to bring behavior in)
A collection field is returned directly, allowing callers to mutate it
→ Encapsulate Collection
An array where index position encodes meaning
→ Replace Array with Object
A numeric literal with domain meaning appears in 2+ places
→ Replace Magic Number with Symbolic Constant
Exception: if the literal is a type code driving switch behavior
→ use type-code-refactoring-selector instead
A legacy record or external API structure needs an object-oriented interface
→ Replace Record with Data Class
Domain data and business methods are trapped in a GUI class
→ Duplicate Observed Data
One class needs to navigate to the other but only a one-way link exists
→ Change Unidirectional Association to Bidirectional
A two-way association exists but one end no longer needs the other
→ Change Bidirectional Association to Unidirectional
The same 2-4 data items travel together in field lists and parameter lists
→ Extract Class (Data Clumps path — see code-smell-diagnosis for full Data Clumps mechanics)
Then: Introduce Parameter Object or Preserve Whole Object at call sites
ACTION: Read the class containing the problematic data. Identify the field(s), their type, how they are set, and how they are used. Then grep for all callers.
WHY: Data organization refactorings are usage-driven. The field declaration tells you what exists; the callers tell you what behavior the data is accumulating, which decides whether a value object or reference object is needed, and what methods need to move. Reading only the field declaration without understanding usage leads to incomplete refactorings where the data gets a new type but the behavior stays scattered on the wrong class.
Questions to answer before selecting mechanics:
Work through the mechanics of the refactoring selected in the framework above. Each refactoring below states its mechanics and the WHY for each step.
When: A field is a primitive (string, int, float) but behavior keeps accumulating on the owner — formatting, parsing, comparison, validation.
Mechanics:
getCustomerName() is clearer than getCustomer() when the caller cares about the name, not the object identity.Follow-on decision: After Replace Data Value with Object, determine whether the new object needs to be a reference object. If multiple owner objects need to share the same conceptual instance (e.g., all orders for the same customer should point to the same Customer object), apply Change Value to Reference next.
This is the most consequential decision in data organization refactoring. Getting it wrong causes aliasing bugs (mutable value objects) or unnecessary coordination complexity (reference objects where values would suffice).
Value objects (Date, Money, Currency, PhoneNumber):
equals() and hashCode() based on fieldsReference objects (Customer, Account, Employee, Order):
Decision criteria:
QUESTION → DIRECTION
Does changing this object's data need to be seen → Reference object
by all other objects that hold a reference to it? (aliasing is required)
Does the object represent a real-world entity with → Reference object
independent existence (customer, account, order)?
Is this a measurement, amount, code, or coordinate → Value object
defined purely by its data?
Would it be correct for two objects to each have → Value object
their own independent copy?
Is the object used in distributed or concurrent → Value object (safer)
contexts where shared mutable state is problematic?
Change Value to Reference (when a value object needs to become a reference object):
getNamed() instead of create()). WHY: the name communicates the semantics — callers should know they are getting a shared instance.Change Reference to Value (when a reference object is too awkward and should become a value object):
equals() and hashCode() based on the object's data fields. WHY: value objects are equal when their data is equal; without overriding these methods, equality falls back to object identity, defeating the purpose of the conversion.When: A class accesses its own field directly, and you need subclasses to be able to override how the value is produced (computed value, lazy initialization) without changing the field access code scattered through the class.
Mechanics:
initialize() method rather than the setter. WHY: setters often have behavior that is appropriate for changes after construction but not for initialization. Using the setter in the constructor can trigger that behavior prematurely.When: A field is public and accessed directly by external classes, violating the encapsulation principle. The class cannot control what values are set or observe when the value changes.
Mechanics:
When: A method returns a collection field directly (a list, set, or map), allowing callers to add, remove, or replace elements without the owning class knowing.
Mechanics:
add(element) and remove(element) methods to the owning class. WHY: these are the controlled mutation points. The owning class can enforce invariants (uniqueness, ordering, related state updates) in these methods.initialize to clarify it is for initial population only, or removed entirely.person.getCourses().add(...)). Change them to call the new add/remove methods on the owning class. WHY: getter-then-mutate is the same as direct field access — it bypasses the owning class's control.Collections.unmodifiableSet(), Python: tuple() or frozenset(), TypeScript: readonly array). WHY: the unmodifiable return makes it structurally impossible for callers to mutate the collection through the getter, enforcing the encapsulation permanently.When: An array is used to hold heterogeneous data where position encodes meaning — row[0] is the team name, row[1] is wins, row[2] is losses. Position-as-convention is fragile and invisible.
Mechanics:
row.getName() is self-documenting; row[0] requires the reader to remember the convention.When: A literal number with special domain meaning appears in the code. The reader cannot tell from the literal what it means.
Mechanics:
GRAVITATIONAL_CONSTANT not NINE_POINT_EIGHT_ONE). WHY: the name is the documentation. A constant named after its value provides no more information than the literal; a constant named after its meaning makes the code self-explanatory.Alternatives to consider first:
array.length instead of a constant. WHY: array.length is always correct even if the array size changes; a constant can drift.type-code-refactoring-selector instead. WHY: type codes need polymorphism, not just symbolic names.When: Code interfaces with a legacy record structure (from a traditional programming environment, an external API, or a database row) that needs an object-oriented wrapper.
Mechanics:
When: A GUI class (a window, form, or controller) contains both the domain data (e.g., start date, end date, length of interval) and the business calculations on that data. The business logic cannot be tested without the GUI; it cannot be reused in other contexts.
This is the most complex refactoring in the chapter. It requires the Observer pattern (or equivalent event listener mechanism). Apply it when the coupling between GUI and domain is blocking testability or reuse.
Mechanics:
When: Two classes need to use each other's features, but only a one-way link exists. Adding the reverse link is needed for a new feature.
Mechanics:
friendOrders()). WHY: the helper lets the controlling class maintain consistency without making the back pointer fully public, limiting the surface area for misuse.When: A two-way link exists but one end no longer needs the other. Bidirectional associations add complexity: they must be maintained in sync, they can create zombie objects (objects that cannot be garbage-collected because a back pointer keeps them alive), and they introduce coupling between packages.
Mechanics:
ACTION: After mechanics are complete, confirm: (1) the old exposure point is gone or private; (2) all callers use the new interface; (3) no callers bypass the new interface through another path.
WHY: Data organization refactorings leave behind debris if not verified. A collection field that was encapsulated but still has one caller using the getter to mutate directly is not encapsulated. A primitive replaced with an object but still passed as a raw string through one old parameter path is still a primitive at that path.
Verification checklist:
equals() and hashCode() overridden (if applicable)ACTION: Look for the next refactoring that the completed refactoring enables.
WHY: Data organization refactorings are rarely endpoints. Replace Data Value with Object creates a class that should have behavior moved into it. Encapsulate Collection reveals client code that should be moved to the owning class. The value of each refactoring compounds when the follow-on steps are taken.
Common follow-on sequences:
| Completed refactoring | Natural follow-on |
|---|---|
| --- | --- |
| Replace Data Value with Object | Move Method — migrate behavior from the old owner to the new class |
| Change Value to Reference | Ensure registry is consistent; check that all creation sites use the factory |
| Encapsulate Collection | Move Method — move iteration/query code from callers to the owning class |
| Replace Array with Object | Move Method — behavior operating on the array slots belongs on the new class |
| Encapsulate Field (on a Data Class) | Move Method — the Data Class smell is not resolved until behavior moves in |
| Duplicate Observed Data | Move Method — business calculation methods migrate from GUI to domain class |
1. Value vs. reference is a decision that often needs reversing.
Fowler explicitly notes that this decision is not always clear and frequently needs to be undone. Start with a value object (simpler, no registry needed, no aliasing risk). Convert to a reference object only when the aliasing requirement becomes concrete — when two objects genuinely need to share the same instance so that changes to one are seen by the other.
2. Immutability is not optional for value objects.
A mutable value object is worse than no refactoring. If callers copy the reference and then mutate the object through it, they silently affect each other's state. Before completing Change Reference to Value, verify that all setters are removed. If the object cannot become immutable, keep it as a reference object.
3. Encapsulate Collection is a two-step refactoring.
The first step is the interface change (add/remove methods, unmodifiable getter). The second step — which most developers skip — is moving the collection-operating code from callers back to the owning class. A collection that is encapsulated but still iterated externally for every operation has the right interface but has not yet earned its encapsulation.
4. Self Encapsulate Field before Duplicate Observed Data.
Self Encapsulate Field is the prerequisite for Duplicate Observed Data. Without self-encapsulation, field access is scattered across the GUI class and cannot be redirected to the domain object in a controlled way. Always apply Self Encapsulate Field first, verify it compiles, then proceed to the duplication and synchronization steps.
5. Magic number replacement requires meaning-matching, not value-matching.
Replace the literal only where it represents the same concept as the constant's name. The same number value can appear in code for different reasons. Replacing all occurrences of 24 with HOURS_PER_DAY will be wrong wherever 24 means something else entirely.
Scenario: An Employee class has a String telephoneNumber field. Methods on Employee and its callers format the number, extract the area code, and validate the format in multiple places.
Selection: Replace Data Value with Object — the primitive has accumulated behavior that belongs on a class.
Execution:
TelephoneNumber class with String _number field, constructor TelephoneNumber(String number), and getter getNumber().Employee._telephoneNumber type from String to TelephoneNumber.Employee's getter: String getTelephoneNumber() { return _telephoneNumber.getNumber(); }_telephoneNumber = new TelephoneNumber(number);_telephoneNumber = new TelephoneNumber(number);formatNumber(), getAreaCode(), and isValid() from Employee to TelephoneNumber.Value vs. reference decision: Is a TelephoneNumber a real-world entity with independent identity? No — it is defined by its digits. Two TelephoneNumber objects with the same string are equal. Keep it as a value object. Implement equals() and hashCode() on the number string.
Scenario: Person has a Set _courses field with getCourses() and setCourses(Set) methods. Callers do: person.getCourses().add(new Course(...)) and iterate the set externally to count advanced courses.
Selection: Encapsulate Collection — the collection is directly mutable by callers.
Execution:
addCourse(Course arg) { _courses.add(arg); } and removeCourse(Course arg) { _courses.remove(arg); }.private Set _courses = new HashSet();initializeCourses(Set arg) that asserts the collection is empty then calls addCourse for each element. Or remove the setter entirely if callers can use addCourse directly.person.getCourses().add(...) callers and change to person.addCourse(...).public Set getCourses() { return Collections.unmodifiableSet(_courses); }Person as numberOfAdvancedCourses() — the external iteration was Feature Envy on Person's data.Scenario: After applying Replace Data Value with Object to a customer name string in Order, the Customer class exists but each order creates its own Customer object. A requirement arrives: update the credit rating for a customer, and all their orders must see the change.
Selection: The aliasing requirement is now concrete — multiple orders need the same customer instance. Apply Change Value to Reference.
Execution:
public static Customer create(String name) { return new Customer(name); } Make constructor private.private static Dictionary _instances = new Hashtable(); and a private store() method that puts this in the registry by name.Customer.loadCustomers() populates the registry at startup.public static Customer getNamed(String name) { return (Customer) _instances.get(name); }Order constructor and setter to use Customer.getNamed(name) instead of new Customer(name).Result: all orders pointing to the same customer name now share one Customer object. Changes to credit rating are visible everywhere.
| File | Contents | When to read |
|---|---|---|
| ------ | ---------- | -------------- |
references/value-vs-reference-guide.md | Detailed decision tree for value vs. reference with distributed systems considerations, aliasing risk patterns, and language-specific equality semantics | Step 2 — value/reference decision |
references/collection-encapsulation-patterns.md | Language-specific collection encapsulation patterns: Java unmodifiable views, Python properties and frozenset, TypeScript readonly arrays | Step 2 — Encapsulate Collection mechanics |
Sibling skill relationships:
code-smell-diagnosis — run first to identify which data smell is present before selecting a refactoringtype-code-refactoring-selector — for type code integers and enums that drive switch statement behavior; not covered by this skillclass-responsibility-realignment — when Feature Envy or Inappropriate Intimacy is the primary smell alongside data problemsmethod-decomposition-refactoring — when Long Method is present in the same class; often co-occurs with Data Class smellThis skill is licensed under CC-BY-SA-4.0.
Source: BookForge — Refactoring: Improving the Design of Existing Code by Martin Fowler and Kent Beck.
Install related skills from ClawhHub:
clawhub install bookforge-code-smell-diagnosisclawhub install bookforge-type-code-refactoring-selectorclawhub install bookforge-class-responsibility-realignmentclawhub install bookforge-method-decomposition-refactoringOr install the full book set from GitHub: bookforge-skills
共 1 个版本
暂无安全检测报告