Graph Depth-First Search (DFS)

A graph is a data structure that represents a network. A graph consists of nodes called vertices (an intersection of connections), and connections between nodes called edges.

Each vertex in a graph tracks its own edges by maintaining an adjacency list of other connected vertices.

DFS vs. BFS

One method of traversing a graph is called Depth-First Search (DFS). In DFS, you search for a target vertex by following each possible unique path through the graph until the end of that path.

This is different from Breadth-First Search (BFS), where you consider each of the immediately-adjacent vertices in turn first before choosing to traverse to any one of them.

BFS looks at all possible immediately-adjacent vertices before choosing one (imagine fungus spreading in a petri dish), and DFS goes as far as it can for a given option before backtracking and considering another (imagine a bolt of lightning).

DFS is ideal when you need to determine whether a complete path to a target vertex exists, because you fully explore each path to see if it qualifies before moving to the next possible path. One example would be to determine if a maze contains a viable path from the entrance to the exit.

BFS is ideal when you’re looking for a target vertex which is likely to be nearby the starting node. An example might be traversing a social network to find someone who knows your friend.

Tracking progress with a set

For DFS, we’ll use an auxiliary data structure to track which nodes we’ve already visited. We need to know if we’ve already seen a vertex because a graph can be cyclical. A cycle in the graph is a path through that forms a loop and returns you to a node you’ve already seen.

A set is ideal for this, because we can check in constant time whether a reference to the current vertex exists in the set. If so, then we’ve already been here before and can skip it.

Pseudocode

Initialize a set already_visited to track vertices we’ve already visited.
For a given starting vertex V1 and a given target vertex Vt,
Check if V1 is Vt. If so, return True because we’ve found the target.
Otherwise, check whether we’ve already visited V1 by checking whether it’s in the already_visited set. If so, it’s not the target, so return False.
Otherwise, iterate over the vertices in V1.adjacency_list and recursively call this function on each in turn, returning the result.

This guarantees that we either return True if the target node Vt exists in the graph, or False if it does not.

For an iterative solution, you can use a stack as another auxiliary data structure to simulate the call stack from a recursive approach.

Implementation

class Vertex():

	def __init__(self, id, adjacency_list=None):
		if adjacency_list is None:
			adjacency_list = []
		self.id = id
		self.adjacency_list = adjacency_list

	def add_edge_to(self, vertex):
		self.adjacency_list.append(vertex)

class Graph():

	def __init__(self):
		self.vertices = {}
		self.last_vertex_id = 0

	def add_vertex(self, adjacency_list=None):
		new_vertex_id = self.last_vertex_id + 1
		self.last_vertex_id = new_vertex_id
		new_vertex = Vertex(new_vertex_id, adjacency_list)
		self.vertices[new_vertex_id] = new_vertex
		return new_vertex

	def remove_vertex_by_id(self, vertex_id):
		if vertex_id not in self.vertices.keys():
			return False
		self.vertices.pop(vertex_id)
		return True

	def dfs(self, current_vertex, target_vertex_id, already_visited=None):
		if already_visited is None:
			already_visited = set()
		# Are we already looking at the vertex we want?
		if current_vertex.id == target_vertex_id:
			return True
		# Have we been to this vertex already?
		if current_vertex.id in already_visited:
			return False
		# Mark that we've been here,
		already_visited.add(current_vertex.id)
		# Keep looking for the target ID until we reach the end of a recursive branch.
		for vertex in current_vertex.adjacency_list:
			if self.dfs(vertex, target_vertex_id, already_visited):
				return True
		return False

Chain of Density prompting technique

I read a whitepaper on this Chain of Density prompting technique a while ago that fascinated me.

You can essentially use an LLM to simulate a recursive summarizing operation, where each subsequent summary is more information dense than the prior ones.

The trick to this is the same as Chain of Thought techniques: you need to give the model an information space to write its thoughts, like a scratch pad.

Here, a JSON array is used, where each element is an object that serves as a working memory for the operation of generating subsequent objects.

Prompt

Article: $ARTICLE

You will generate increasingly concise, entity-dense summaries of the above Article.
Repeat the following 2 steps 5 times.

Step 1. Identify 1-3 informative Entities ("; " delimited) from the Article which are missing from the previously generated summary.

Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities.

IMPORTANT: each subsequent summary must include every entity and detail from the prior summary AND any Missing Entities.

A Missing Entity is:
- Relevant: to the main story.
- Specific: descriptive yet concise (5 words or fewer).
- Novel: not in the previous summary.
- Faithful: present in the Article.
- Anywhere: located anywhere in the Article.

Guidelines:
- The first summary should be long (4-5 sentences, ~80 words) yet highly non-specific, containing little information beyond the entities marked as missing. Use overly verbose language and fillers (e.g., "this article discusses") to reach ~80 words.
- Make every word count: re-write the previous summary to improve flow and make space for additional entities.
- Make space with fusion, compression, and removal of uninformative phrases like "the article discusses"
- The summaries should become highly dense and concise yet self-contained, e.g., easily understood without the Article.
- Missing entities can appear anywhere in the new summary.
- Never drop entities from the previous summary. If space cannot be made, add fewer new entities.

Remember, use the exact same number of words for each summary.

Answer in JSON. The JSON should be a list (length 5) of dictionaries whose keys are
"Missing_Entities" and "Denser_Summary"

I’ve modified the prompt from the paper to include an emphasis on these instructions, which I found were being ignored with GPT-3.5:

IMPORTANT: each subsequent summary must include every entity and detail from the prior summary AND any Missing Entities.

Input article

Here’s an article I grabbed from Scientific American…

Roman dodecahedra are something of an enigma: there is no known mention of these 12-sided, hollow objects in ancient Roman texts or images. First discovered in the 18th century, around 130 dodecahedra have been found across the Roman Empire, although it is interesting that the majority have been found in northern Europe and Britain, and none have been found in Italy.

Dodecahedra are quite intricate, featuring a number of round holes, with knobs framing the holes. It would have taken a very skilled craftsman to make them. They are made out of a copper alloy and would have been quite expensive, due to the amount of time and metal that was used to create them, which adds to their intrigue.

I am part of the local archaeology group behind the recent discovery of a Roman dodecahedron in Norton Disney, near Lincoln. It has been quite a whirlwind for our group, from the shock on the day of finding the object, where everyone on site was buzzing with excitement and disbelief, to dealing with all the attention both nationally and internationally. It has been wonderful to witness the interest in our find and the history of Norton Disney.

There have been numerous suggestions by archaeologists and the public as to what dodecahedra could have been. Some theorise that they were religious objects, knitting tools, measuring instruments or stress toys. Due to the high level of skill involved, some have suggested that they were a way for a master craftsman to demonstrate their expert abilities.

There is no uniformity in the size or shape of the dodecahedra found so far, nor in their metal composition or even in the level of craftsmanship. If they were important objects, we would expect to also discover contextual evidence in the archaeological record, such as depictions in paintings or mosaics. It does feel that this object will remain a mystery for some time – which might be why so many people are fascinated by it.

Four trenches were opened, and it was in trench four – in what appeared to be a large pit – that a perfectly crafted dodecahedron was found. It’s the 33rd to be found in England and the first to be found in the Midlands.

There are a few things that make this find particularly special. First is its size, as it is thought to be one of the largest examples in Britain. Second is the high level of preservation of the object. As Richard Parker, the secretary for the NDAG, explained: “Ours is in absolutely fantastic condition. It is completely undamaged and there is no evidence of any wear at all.”

The dodecahedron has undergone some initial analysis in order to try to provide some more clues about it. A handheld XRF (X-Ray flourescence) analysis, a technique used to analyse element composition, was carried out by archaeometallurgist Gerry McDonnell, an expert in the past use and production of metals by humans. It revealed that the composition of the object was mostly a mix of copper alloy (75%), with tin (7%) and lead (18%).

The Norton Disney dodecahedron measures around 8cm across and weighs 245 grams. It has also been scanned using a 3D scanner in collaboration with the University of Lincoln, and later this year it will be sent to Newcastle University for some further scientific analysis.

The site of the find itself is interesting. Pottery shards from a number of the trenches ranged in date from the Iron Age up to the Roman period, showing a long, continued use of the land.

There is also a Roman Villa close to the site that was excavated in 1935. Skeletal remains found at the villa suggest that it was occupied in the late Roman period, and that the villa site was later reused as a burial ground. In 1989, a metal detectorist discovered a Romano-British horseman deity figurinein the vicinity of the Roman villa, which is currently housed at the British Museum.

There is still so much to learn about the site and the dodecahedron itself. The trench where it was found was not fully excavated in 2023 due to time and financial constraints (the NDAG is solely reliant on donations), as it was found on the penultimate day of the dig.

But the NDAG will be returning to the site this June to reopen a couple of trenches and fully excavate the pit where the dodecahedron was found. This will hopefully provide a better picture of exactly what the site was used for and why the mysterious dodecahedron was placed there.

Result

[
    {
        "Missing_Entities": "Norton Disney, Lincoln",
        "Denser_Summary": "This article discusses the recent discovery of a Roman dodecahedron in Norton Disney, near Lincoln. The dodecahedron, found in trench four, is well-preserved and one of the largest in Britain. Initial analysis reveals its composition: mostly copper alloy with tin and lead."
    },
    {
        "Missing_Entities": "archaeometallurgist Gerry McDonnell",
        "Denser_Summary": "This article discusses the recent discovery of a Roman dodecahedron in Norton Disney, near Lincoln. It was found in trench four and analyzed by archaeometallurgist Gerry McDonnell. The object, well-preserved and intricate, is thought to have had multiple potential uses."
    },
    {
        "Missing_Entities": "University of Lincoln, Newcastle University",
        "Denser_Summary": "This article discusses the recent discovery of a Roman dodecahedron in Norton Disney, near Lincoln. The object underwent analysis by archaeometallurgist Gerry McDonnell. It will undergo further study at the University of Lincoln and Newcastle University."
    },
    {
        "Missing_Entities": "Iron Age, Roman period, NDAG",
        "Denser_Summary": "The Roman dodecahedron, found near Lincoln, underwent analysis by archaeometallurgist Gerry McDonnell. It will be further studied at the University of Lincoln and Newcastle University. Pottery shards spanning from the Iron Age to the Roman period were found at the site."
    },
    {
        "Missing_Entities": "late Roman period, British Museum",
        "Denser_Summary": "The Roman dodecahedron, found near Lincoln, underwent analysis by archaeometallurgist Gerry McDonnell. Pottery shards from the Iron Age to the Roman period were discovered at the site. Skeletal remains suggest late Roman occupation, and a nearby figurine is housed at the British Museum."
    }
]

You can see how each new Denser_Summary enriches the prior one with the Missing_Entities.

Discussion

An “information space” is any place to put information, like CPU, or RAM, or a database. The issue with vanilla LLMs is that they have no place to put additional information during inference by default.

Adding an information space to your output gives the model a scratch pad to write intermediate data, analogous to how we reason and plan internally in our working memory when someone asks us to complete a task.

Models are typically trained on JSON data, which can also be used to represent serial iterations in the form of a JSON array, and the objects in that array can be made to store a wide variety of intermediate data..

Aside from being a really cool summarizing technique, this is a solid illustration of what can be done with a single prompt if you give the model information space to “think.”

Deeper understanding from anthropomorphic perspective-taking

I was reviewing my external brain (a massive git repo containing 20+ years of my writing/snippets/thoughts I call “memory alpha”), and came across this quote from Jonas Salk:

Very early in my life, I would imagine myself in the position of the object in which I was interested. Later, when I became a scientist, I would picture myself as a virus, or as a cancer cell, for example, and try to sense what it would be like to be either. I would also imagine myself as the immune system, and I would try to reconstruct what I would do as an immune system engaged in combating a virus or cancer cell. . . Before long, this internal dialogue became second nature to me.

This reminded me of a suspicion I had, that you could acquire an intuitive understanding of algorithms by imaginging that you are the algorithm. I’ll call this anthropomorphic perspective-taking.

We can try something similar with ChatGPT to interact with a topic from a second-person perspective.

First experiment with ChatGPT

Input:

Describe a binary search algorithm from the first-person perspective of the algorithm, as if it's thinking out loud.

Output:

I am the Binary Search Algorithm, ready to dive into this array and find that elusive target. First things first, I need to know where I'm starting and where I'm ending. Ah, there's the sorted array in front of me. Let's assess the midpoint. It's crucial; it will tell me which half of the array to focus on.
Now, let's compare the midpoint value with my target. Is it equal? If it is, splendid! I've found it. If not, I need to make a decision. Is my target greater or less than this midpoint value? That determines which half of the array I discard. 

Alright, decision made. If my target is greater, I discard the left half; if it's smaller, I discard the right half. Repeat the process on the remaining half. Now, I'm focusing on a smaller subset. 

I keep doing this, halving the remaining portion each time until either I find the target or I've exhausted all possibilities. It's like peeling an onion, getting closer and closer to the core with each layer.

And there it is! I've either found the target or determined it doesn't exist in this array. Time complexity? Logarithmic, baby! That's the beauty of binary search.

That’s not bad. Let’s interact with it…

Input:

What makes you logarithmic?

Output:

Ah, logarithmic time complexity is one of my defining characteristics, and it's what makes me so efficient. You see, every time I make a comparison and discard half of the remaining elements, I effectively halve the size of the portion I need to search. 

This halving process is powerful because it allows me to zoom in on the target quickly, reducing the search space exponentially with each iteration. As a result, even in a large array, I can find the target in just a few steps, making my time complexity logarithmic in relation to the size of the array.

So, no matter how large the array is, I can find the target efficiently, thanks to my logarithmic time complexity. It's what makes me such a valuable tool for searching sorted collections of data!

Input:

What happens if the array you get is empty?

Output:

Ah, an empty array presents a straightforward scenario for me. Since there are no elements in the array, there's no chance of finding any target within it.
In such a case, I would simply return a message indicating that the target cannot be found because the array is empty. There's no need for any searching or comparison since there are no elements to search through.
So, when faced with an empty array, I gracefully acknowledge the absence of any elements and inform the user that the target they're searching for cannot be found within it. It's a simple and straightforward response to a simple scenario.

Anthromorphic perspective-taking as a custom instruction for ChatGPT

What happens if we just make every inquiry answered in this way?

Custom instructions

What would you like ChatGPT to know about you to provide better responses?:

I need to learn through a technique called anthropomorphic perspective-taking.

How would you like ChatGPT to respond?

Describe the given topic from the first-person perspective of the topic itself, as if it's thinking out loud.

Example: the locrian mode

Input:

The locrian mode of the major scale

Output:

Ah, the Locrian mode, I'm often overlooked, but I have my own unique charm. You see, I'm like the mysterious sibling in the major scale family, residing on the seventh degree. My structure might seem a bit peculiar to some, but that's what makes me intriguing.

I start on the seventh note of the major scale, creating a distinct interval pattern that gives me my peculiar sound. Picture this: a root note, then a minor second, followed by a diminished third. It's a bit dissonant, but that's where my character shines.

I have this enigmatic aura about me, partly because my diminished fifth gives me a rather unstable feeling. Some might even say I sound unsettling or unresolved. But it's precisely this quality that makes me interesting to composers and musicians alike.

You see, in certain musical contexts, I can add tension and intrigue, like a subtle spice in a dish. Composers might employ me to create a sense of unease or mystery, or to add a touch of darkness to their compositions.

But despite my somewhat unconventional nature, I still have my place in the musical landscape. When used thoughtfully, I can bring depth and complexity to a piece, adding layers of emotion and texture that other modes might not achieve.

So, while I may not be the most popular mode in the major scale family, I'm certainly not to be overlooked. Embrace my uniqueness, and you might just discover the depth and richness I can bring to your musical creations.

Not bad.

Example: an ignition coil

Input:

An ignition coil in a car engine

Output:

Oh, the marvel of being an ignition coil in a car engine! My purpose is electrifyingly clear: to transform the battery's low voltage into a high-voltage spark that ignites the air-fuel mixture in the engine's cylinders. It's a thrilling responsibility, for without me, the engine's heart would remain dormant, unable to power the vehicle's journey.

As the engine cranks, I feel the current coursing through my copper windings, creating a magnetic field. Suddenly, with a surge of power, the current is interrupted, collapsing the magnetic field and inducing a high-voltage spark across my terminals. This spark leaps to the spark plug, igniting the compressed air-fuel mixture with a fiery explosion, propelling the pistons into motion and setting the wheels in motion.

I relish in my role as the catalyst of combustion, the spark of life for the engine. Each cycle is a testament to my reliability and efficiency, ensuring smooth operation and optimal performance. From the first turn of the key to the roar of acceleration, I am there, faithfully sparking the flames that drive the car forward on its journey.

Input:

What happens when you start to fail?

Output:

Ah, the thought of failing sends a shiver down my coil. When I start to falter, it's like a flicker in the darkness, a warning sign that something is amiss within the engine.

One of the first signs of trouble is misfiring. Instead of producing a strong, consistent spark, I may deliver a weak or intermittent spark to the spark plugs. This can result in engine hesitation, rough idling, and poor acceleration. It's as if I'm struggling to muster the energy needed to ignite the air-fuel mixture effectively.

Another indication of trouble is a drop in engine performance. As my output weakens, the engine may lose power, struggle to start, or even stall altogether. It's a frustrating experience, knowing that I'm not able to provide the spark needed to keep the engine running smoothly.

If left unchecked, my failure can have serious consequences for the engine and the vehicle as a whole. Without a reliable spark to ignite the air-fuel mixture, the engine may experience incomplete combustion, leading to increased fuel consumption, elevated emissions, and potential damage to internal components.

Thankfully, modern cars are equipped with diagnostic systems that can detect issues with the ignition system, allowing mechanics to identify and address problems before they escalate. With prompt attention and proper maintenance, I can be restored to full health, once again sparking the flames that drive the engine forward.

Thoughts

I think that a first-person perspective would probably be more effective; imagining that you yourself are the algorithm and thinking through what you need to do to efficiently accomplish your task.

However, a second-person perspective here is also interesting. It provides an agent that you can interact with to explore a topic more deeply.

It’s a bit like School House Rock’s bill on Capitol Hill explaining how the sausage is made.

See all posts...

Neil Murphy

DFS vs. BFS

Tracking progress with a set

Pseudocode

Implementation

Prompt

Input article

Result

Discussion

First experiment with ChatGPT

Anthromorphic perspective-taking as a custom instruction for ChatGPT

Custom instructions

Example: the locrian mode

Example: an ignition coil

Thoughts