<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="http://friend.computer/feed.xml" rel="self" type="application/atom+xml" /><link href="http://friend.computer/" rel="alternate" type="text/html" /><updated>2023-04-30T20:10:13+00:00</updated><id>http://friend.computer/feed.xml</id><title type="html">The Computer is Your Friend!</title><subtitle>A blog where I write things and posts some pictures</subtitle><entry><title type="html">The Powerful Duo of ChatGPT and WikiData</title><link href="http://friend.computer/jekyll/update/2023/04/30/wikidata-llms.html" rel="alternate" type="text/html" title="The Powerful Duo of ChatGPT and WikiData" /><published>2023-04-30T19:12:00+00:00</published><updated>2023-04-30T19:12:00+00:00</updated><id>http://friend.computer/jekyll/update/2023/04/30/wikidata-llms</id><content type="html" xml:base="http://friend.computer/jekyll/update/2023/04/30/wikidata-llms.html">&lt;p&gt;ChatGPT knows an enormous number of facts about baseball stats, movie quotes, and so on.&lt;/p&gt;

&lt;p&gt;However, it can often “hallucinate” making stuff up that looks reasonable, but is actually nonsense.
There’s many solutions under development to help address that. Two major approaches that
have gotten a lot of attention are fine-tuning with human feedback
(&lt;a href=&quot;https://huggingface.co/blog/rlhf&quot;&gt;RLHF&lt;/a&gt;)
or pasteing in appropriate context (in-context learning) with
&lt;a href=&quot;https://github.com/cohere-ai/sandbox-toy-semantic-search&quot;&gt;semantic search&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One approach that has had less attention for addressing hallucination is
in-context learning utilizing a knowledge / semantic graph.
&lt;a href=&quot;https://neo4j.com/developer-blog/knowledge-graph-based-chatbot-with-gpt-3-and-neo4j/&quot;&gt;Knowledge Graph-Based Chatbot With GPT-3 and Neo4j&lt;/a&gt;
is a pretty good write up on how to build and query against a semantic graph
with ChatGPT. This post covers some similar material, as well as how to touch
up some issues they ran into working with WikiData in their post.&lt;/p&gt;

&lt;p&gt;As a disclaimer, I am not an ML expert, so some of these postulations
may be a bit off and are intended more to illuminate interesting directions
of exploration, instead of being hard statement of fact.&lt;/p&gt;

&lt;p&gt;tl;dr there may be some halluincations :p&lt;/p&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;I’ll be referring to the ChatGPT models as GPT-3.5 and GPT-4.
A neat trick to help disambiguate is that GPT-3.5 gets a green OpenAI
logo, while GPT-4 gets a black OpenAI logo.&lt;/p&gt;

&lt;p&gt;GPT-3.5 does a great job of dealing with common knowledge:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image15.png&quot; alt=&quot;Popular Game in 2005?&quot; /&gt;&lt;/p&gt;

&lt;details&gt;
    &lt;summary&gt;&lt;b&gt;Conversation Text&lt;/b&gt;&lt;/summary&gt;

    &lt;blockquote&gt;
    &lt;b&gt;Me:&lt;/b&gt;
     What is a popular game released in 2005?
    &lt;br /&gt;
    &lt;b&gt;ChatGPT:&lt;/b&gt;
    One popular game that was released in 2005 is &quot;World of Warcraft&quot; by Blizzard Entertainment. It is a massively multiplayer online role-playing game (MMORPG) set in the fantasy world of Azeroth, and it quickly became one of the most successful and influential games in the genre. Other popular games released in 2005 include &quot;Resident Evil 4,&quot; &quot;God of War,&quot; and &quot;Call of Duty 2.&quot;
    &lt;/blockquote&gt;

&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;GPT-4 generally improves upon that being able to recall even more specific facts.&lt;/p&gt;

&lt;p&gt;However, both still have hallucinations, as well as a knowledge cutoff in 2021.&lt;/p&gt;

&lt;p&gt;The “traditional way” folks have tried to reduce hallucinations is through scaling models
and data size. For example, GPT3 is much better at benchmarks than GPT2
basically through brute scaling force of data and compute, instead of fancy tricks with models or features.&lt;/p&gt;

&lt;p&gt;As things have scaled up, &lt;a href=&quot;https://arxiv.org/abs/2005.14165&quot;&gt;emergent zero-shot learning ability&lt;/a&gt;
was noticed with GPT-3 and other LLMs. This ability for “in-context learning” has allowed for
novel approaches to improve accuracy and reduce hallucinations, such as through prompt hacking,
embedding search (&lt;a href=&quot;https://txt.cohere.com/embedding-archives-wikipedia/&quot;&gt;Cohere&lt;/a&gt;), or
searching context with agents(&lt;a href=&quot;https://arxiv.org/abs/2210.03629&quot;&gt;ReAct&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Better reasoning models and fine-tuning has also enabled models to more effectively
“stop talking” instead of making something up. ChatGPT’s effectiveness in this regard
considered primarily due to its effective RLHF was considered a major factor in its success.&lt;/p&gt;

&lt;h3 id=&quot;semantic-triples&quot;&gt;Semantic Triples&lt;/h3&gt;

&lt;p&gt;One interesting and relavent paper to this post is that it’s possible
to edit specific facts in a model – &lt;a href=&quot;https://memit.baulab.info/&quot;&gt;MEMIT&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A particularly useful insight in MEMIT is that:&lt;/p&gt;

&lt;blockquote&gt;
Language models can be viewed as knowledge bases containing memorized tuples (s,
r, o), each connecting some subject s to an object o via a relation r.&quot;
&lt;/blockquote&gt;

&lt;p&gt;This subject-relation-object tuple relation is basically how graph databases work too!&lt;/p&gt;

&lt;h3 id=&quot;wikidata&quot;&gt;WikiData&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://research.google/pubs/pub42240/&quot;&gt;WikiData&lt;/a&gt; is a prominent
graph database associated with Wikipedia that we’ll walk through today
as part of our experiment.&lt;/p&gt;

&lt;p&gt;As stated by &lt;a href=&quot;https://www.wikidata.org/wiki/Wikidata:Introduction&quot;&gt;its introduction&lt;/a&gt;,
Wikidata is a free and open knowledge base that can be read and edited
by both humans and machines. It consists of labeled items connected
through properties to specific values, such as in the diagram below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image21.png&quot; alt=&quot;Example California Graph&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;web-30-not-the-crypto-one&quot;&gt;Web 3.0 (Not the Crypto One)&lt;/h3&gt;

&lt;p&gt;After inventing the World Wide Web, Tim Berners-Lee got up to some other
stuff.&lt;/p&gt;

&lt;p&gt;Tim Berners-Lee originally expressed his vision of &lt;a href=&quot;https://en.wikipedia.org/wiki/Semantic_Web&quot;&gt;the Semantic
Web&lt;/a&gt; in 1999 as
follows:&lt;/p&gt;

&lt;blockquote&gt;
I have a dream for the Web in which computers become capable of
analyzing all the data on the Web -- the content, links, and
transactions between people and computers. A &quot;Semantic Web&quot;, which
makes this possible, has yet to emerge, but when it does, the day-to-day
mechanisms of trade, bureaucracy and our daily lives will be handled by
machines talking to machines. The &quot;intelligent agents&quot; people have
touted for ages will finally materialize.
&lt;/blockquote&gt;

&lt;p&gt;It ended up not being as much of a hit as the regular web due to a
couple of issues
(&lt;a href=&quot;http://www.umbertostraccia.it/cs/download/papers/JWS08/JWS08.pdf&quot;&gt;Paper&lt;/a&gt;).
I’ll highlight a few called out in the wikpage:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Vastness:&lt;/strong&gt; The World Wide Web contains many billions of pages.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Vagueness:&lt;/strong&gt; These are imprecise concepts like &quot;young&quot; or &quot;tall&quot;.
This arises from the vagueness of user queries, of concepts
represented by content providers, of matching query terms to
provider terms and of trying to combine different &lt;a href=&quot;https://en.wikipedia.org/wiki/Knowledge_base&quot;&gt;knowledge
bases&lt;/a&gt;
with overlapping but subtly different concepts.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Uncertainty:&lt;/strong&gt; These are precise concepts with uncertain values. For
example, a patient might present a set of symptoms that correspond
to a number of different distinct diagnoses each with a different probability.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Inconsistency:&lt;/strong&gt; These are logical contradictions that will inevitably
arise during the development of large ontologies, and when
ontologies from separate sources are combined.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, the effort generally stalled out near the beginning of the
2010s. Basically to tackle these issues, you need some smart technology
that can scale and generalize, dealing with uncertain and inconsistent
data.&lt;/p&gt;

&lt;p&gt;Roughly concurrently in the beginning of the 2010s, neural networks started to take off in a
big way, shrugging off the AI Winter II - &lt;a href=&quot;https://www.skynettoday.com/overviews/neural-net-history&quot;&gt;a brief
history&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A bunch of innovations and billions of dollars of investment followed.
Image labeled got largely solved, deprecating this XKCD:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image19.png&quot; alt=&quot;XKCD&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.03762&quot;&gt;Transformers&lt;/a&gt; hit the
scene in 2017, GPT2 in 2019, GPT3 in 2020, and ChatGPT in 2022. &lt;a href=&quot;https://arxiv.org/abs/2304.00612&quot;&gt;Eight
Things to Know about Large Language
Models&lt;/a&gt; roughly lays out
why it was such a big deal beyond King James Style mac and cheese
recipes, in that very impressive emergent abilities, especially around
reasoning and learning new tasks appeared much sooner than expected
timeline wise.&lt;/p&gt;

&lt;blockquote&gt;
Steinhardt (2022) presents results from a competition that was
organized in summer 2021, which gave forecasters access to experts,
extensive evidence, and a cash incentive, and asked them to predict what
state-of-the-art performance with LLMs would be in each of the next four
years on two specific tasks. The results from summer 2022, only one year
into the competition, substantially exceeded what the consensus forecast
said would be possible in 2024
&lt;/blockquote&gt;

&lt;p&gt;Basically, we now have a smart technology that can scale and generalize
very, very well, dealing with the vastness, vagueness, uncertainty, and
inconsistency that plagued the previous attempt at curating a semantic
web. GPT-4 is even starting to outperform many experts and crowd workers
on annotation tasks (&lt;a href=&quot;https://arxiv.org/abs/2304.06588&quot;&gt;Paper&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Now let’s walk through an example!&lt;/p&gt;

&lt;h2 id=&quot;augmented-adding-to-wikidata&quot;&gt;Augmented Adding to WikiData&lt;/h2&gt;

&lt;p&gt;WikiData has a lot of gaps in annotations, generally only having comprehensive
coverage of very specific topics with extensive databases, such as birds, bugs,
and movies.&lt;/p&gt;

&lt;p&gt;To address this, LLMs like ChatGPT or Bard can be used to augment human
annotation. At this stage, you want a human in the loop
for annotating such things. I previously helped with some similar work
for using satellite imagery and machine learning to help label Open
Street Maps (OSM) ([article]](https://techcrunch.com/2019/07/23/facebook-and-openstreetmaps-empower-the-mapping-community-with-ai-enhanced-tools/)).
The quote from Martijn van Exel really underscores the tension between
man and machine here:&lt;/p&gt;

&lt;blockquote&gt;
The tool strikes a good balance between suggesting machine-generated
features and manual mapping. It gives mappers the final say in what ends
up in the map, but helps just enough to both be useful and draw
attention to undermapped places. This is definitely going to be a key
part of the future of OSM. We can never map the world, and keep it
mapped, without assistance from machines. The trick is to find the sweet
spot. OSM is a people project, and the map is a reflection of mappers'
interests, skills, biases, etc. That core tenet can never be lost, but
it can and must travel along with new horizons in mapping.
&lt;/blockquote&gt;

&lt;p&gt;The ethos of what I see possible here really draws from that project,
namely combining the force of many people and models together to steward
a data resource held in common. There’s probably a future where direct
model predictions are ingested, but you first need to strike the right
balance on working with an open project.&lt;/p&gt;

&lt;h3 id=&quot;video-game-locations&quot;&gt;Video Game Locations&lt;/h3&gt;

&lt;p&gt;Stemming off of the earlier question about video games, let’s working on
filling out the narrative location property
(&lt;a href=&quot;https://www.wikidata.org/wiki/Property:P840&quot;&gt;P840&lt;/a&gt;) for
a game that lacks it.&lt;/p&gt;

&lt;p&gt;First, we build a query to find video games lacking narrative location.
Narrative location looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image17.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;SPARQL is notoriously hard to write for beginners, so we’ll use the
Query Builder to build an initial example for ChatGPT to work with
(&lt;a href=&quot;https://w.wiki/5Rvo&quot;&gt;link&lt;/a&gt;). The &lt;a href=&quot;https://www.wikidata.org/wiki/Wikidata:Request_a_query&quot;&gt;Request a query&lt;/a&gt;
page also has some good examples to seed with.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image8.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Clicking “&lt;a href=&quot;https://query.wikidata.org/&quot;&gt;Show query in the Query Service&lt;/a&gt;,” we get the underlying SPARQL
query. You have to run the query first.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image5.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We want to find games without a narrative location set, so next we’ll
copy paste this over to ChatGPT to add in that filter.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image4.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;details&gt;
    &lt;summary&gt;&lt;b&gt;Conversation Text&lt;/b&gt;&lt;/summary&gt;

    &lt;blockquote&gt;
    Me:

SELECT DISTINCT ?item ?itemLabel WHERE {
_SERVICE wikibase:label { bd:serviceParam wikibase:language
\&quot;\[AUTO_LANGUAGE\]\&quot;. }_
{
SELECT DISTINCT ?item WHERE {
?item p:P31 ?statement0.
?statement0 (ps:P31/(wdt:P279\*)) wd:Q7889.
}
LIMIT 100
}
}

Add a filter to the above query to items lacking a narrative location
&lt;br /&gt;

    ChatGPT:

    _To add a filter to the above query to retrieve only items that lack a
    narrative location, you can use the \`MINUS\` keyword to exclude items
    that have a value for the property \`P840\` (narrative location) as
    shown below:_

\`\`\`

SELECT DISTINCT ?item ?itemLabel WHERE {
_SERVICE wikibase:label { bd:serviceParam wikibase:language
\&quot;\[AUTO_LANGUAGE\]\&quot;. }_
{
SELECT DISTINCT ?item WHERE {
?item p:P31 ?statement0.
?statement0 (ps:P31/(wdt:P279\*)) wd:Q7889.
}
LIMIT 100
}
MINUS {
?item wdt:P840 ?narrativeLocation.
}
}
\`\`\`

    _This query will only return items that have instance of (\`P31\`) with
    subclasses of (\`P279\*\`) \`Q7889\` (work of fiction), and do not have
    a value for \`P840\` (narrative location) property._
    &lt;/blockquote&gt;

&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The tail of the response is a hallucination, but the query is correct.
Let’s copy-paste this into WikiData and run it
(&lt;a href=&quot;https://w.wiki/6cc9&quot;&gt;Query&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;In my run, my first result was the 2005 game Exit
([wikidata]](https://www.wikidata.org/wiki/Q1137071)).
Reading through its
&lt;a href=&quot;https://en.wikipedia.org/wiki/Exit_(video_game)&quot;&gt;article&lt;/a&gt;,
it doesn’t have a clear narrative location. Let’s see how ChatGPT 3.5
handles that.&lt;/p&gt;

&lt;p&gt;We copy-paste in the text of the article and ask “What is the narrative
location of this game?.” It correctly notes that there’s not a clear
narrative location. We can either start or follow up with a question
like “Could you answer yes or no in json form” to output this in a form
we could more automatically work with. Further improvements can be made
by prompted with a few examples of the correct question and answer
format we want to parse against.&lt;/p&gt;

&lt;p&gt;Our second game we try is Brothers in Arms 2: Global Front
(&lt;a href=&quot;https://www.wikidata.org/wiki/Q1028054&quot;&gt;Wikidata&lt;/a&gt;). We
paste in the article to ChatGPT 3.5 and then ask “Could you provide the
list of narrative locations in json form from this game? If it doesn't
have narrative locations, provide an empty list.”&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image1.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;details&gt;
    &lt;summary&gt;&lt;b&gt;Conversation Text&lt;/b&gt;&lt;/summary&gt;

    &lt;blockquote&gt;
    It replies:
    _{_
    _\&quot;narrative_locations\&quot;:\[_
    _\&quot;Solomon Islands\&quot;,_
    _\&quot;North Africa\&quot;,_
    _\&quot;Sicily\&quot;,_
    _\&quot;Normandy\&quot;,_
    _\&quot;Germany\&quot;,_
    _\&quot;Pacific\&quot;_
    _\]_
    _}_
    &lt;/blockquote&gt;

&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The Pacific as a narrative location is a little iffy, but otherwise, not bad.&lt;/p&gt;

&lt;p&gt;Bard keeps the Pacific, but drops Germany:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image3.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;GPT-4 eschews adding the Pacific:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image6.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Great! Let’s add these locations to the wikidata entry.&lt;/p&gt;

&lt;p&gt;Oops, Solomon Islands are ambiguous. Let’s ask GPT-4 to help provide
a bit of disambiguation. We can improve the methodology here over time.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image9.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;details&gt;
    &lt;summary&gt;&lt;b&gt;Conversation Text&lt;/b&gt;&lt;/summary&gt;
    &lt;blockquote&gt;
    Me: _Would Solomon Island (the island sovereign state in Oceania) or
    Solomon Islands (archipelago in the South Pacific) be more correct as a
    narrative location for this game? Could you choose the former or latter
    as more accurate?_

    GPT-4: _The more accurate narrative location for this game would be
    the \&quot;Solomon Islands\&quot; (archipelago in the South Pacific). _
    &lt;/blockquote&gt;

&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Asking for another ambiguity “Would Sicily (region of Italy) or Sicily
(island in the Mediterrean Sea) be better?”, GPT-4 answers “The
better narrative location for this game would be &quot;Sicily&quot; (island in
the Mediterranean Sea).”&lt;/p&gt;

&lt;p&gt;This one’s honestly a bit ambiguous as to the right thing to do. You
could feasibly add both! This might be a case where subclassing the
property is right.&lt;/p&gt;

&lt;p&gt;Anyways, I ended up adding both:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image14.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And voila! All of these steps presented are expectedly automatable, such
that we could tie this into a neat UI that makes an automatic prediction
and references the specific text it’s inferring it from.&lt;/p&gt;

&lt;h2 id=&quot;improving-gpt-4-trivia-accuracy&quot;&gt;Improving GPT-4 Trivia Accuracy&lt;/h2&gt;

&lt;p&gt;One of the advantages a graph query has over embedding search
is being able to do more efficient “multi-hop” queries. For example, this
question from the &lt;a href=&quot;https://hotpotqa.github.io/&quot;&gt;HotPotQA benchmark&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
 What was the population of the town as of 2000 where the historic tugboat Catawissa is located?
&lt;/blockquote&gt;

&lt;p&gt;Most search approaches are going to need to take two hops here – first to answer
what town where Catawissa is located and then the population of that town. With a graph search,
you can instead do Catawissa -&amp;gt; location -&amp;gt; population as one query.&lt;/p&gt;

&lt;p&gt;This is basically the same idea behind one benefit of GraphQL vs REST.&lt;/p&gt;

&lt;blockquote&gt;
GraphQL queries access not just the properties of one resource but also 
smoothly follow references between them. While typical REST APIs require 
loading from multiple URLs, GraphQL APIs get all the data your app needs 
in a single request. Apps using GraphQL can be quick even on slow mobile network connections.
&lt;/blockquote&gt;

&lt;p&gt;Let’s ask GPT-3.5, Bard, and GPT-4 a question related to info we
just added to WikiData.&lt;/p&gt;

&lt;p&gt;We ask “What's a video game published in February 2010 that takes place
in the Solomon Islands and North Africa?” Spoiler: the answer is
Brothers in Arms 2: Global Front.&lt;/p&gt;

&lt;h3 id=&quot;gpt-35&quot;&gt;GPT-3.5&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image10.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;GPT-3.5 is confidently wrong. &lt;a href=&quot;https://en.wikipedia.org/wiki/Battlefield:_Bad_Company_2&quot;&gt;Battlefield: Bad Company
2&lt;/a&gt;
was released in March 2010 and does not take place in those locations.&lt;/p&gt;

&lt;h3 id=&quot;bard&quot;&gt;Bard&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image11.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Bard is also wrong. Brothers in Arms 2: Global Front is an acceptable
answer.&lt;/p&gt;

&lt;h3 id=&quot;gpt-4&quot;&gt;GPT-4&lt;/h3&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image18.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;GPT-4 is wrong about the release date, but is correct about where
Battlefield: Bad Company 2 takes place.&lt;/p&gt;

&lt;h3 id=&quot;gpt-4--wikidata&quot;&gt;GPT-4 + WikiData&lt;/h3&gt;

&lt;p&gt;Instead of directly asking ChatGPT the answer, we instead ask it to look
up the information in WikiData.&lt;/p&gt;

&lt;p&gt;First, we need to dump out the semantic components of this question.
GPT-4 has many WikiData properties memorized, but not items, such as
North Africa (Q27381).&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image2.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We then dump each of these items into search and dump out the tags for
all of them.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;subject (Q164573)&lt;/li&gt;
  &lt;li&gt;video game (Q7889)&lt;/li&gt;
  &lt;li&gt;publication date (Q1361758)&lt;/li&gt;
  &lt;li&gt;publication date (P577)&lt;/li&gt;
  &lt;li&gt;December 2005 (Q464261)&lt;/li&gt;
  &lt;li&gt;narrative location (Q105115142)&lt;/li&gt;
  &lt;li&gt;narrative location (P840)&lt;/li&gt;
  &lt;li&gt;Solomon Islands (Q685)&lt;/li&gt;
  &lt;li&gt;Solomon Islands (Q148966)&lt;/li&gt;
  &lt;li&gt;North Africa (Q27381)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This portion is pretty brittle currently, but ended up
working well enough for this question. It could be improved
possibly with some sort of text / graph embedding search.&lt;/p&gt;

&lt;p&gt;We then ask this question to GPT-3.5 (note that we don’t need 4 here,
though 4’s version has better comments!):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image16.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;details&gt;
    &lt;summary&gt;&lt;b&gt;Conversation Text&lt;/b&gt;&lt;/summary&gt;

    &lt;blockquote&gt;
    **Me: **
    _Here\'s an example WikiData query:_
    _\&quot;\&quot;\&quot;_
    _SELECT DISTINCT ?item ?itemLabel WHERE {_
    _SERVICE wikibase:label { bd:serviceParam wikibase:language
    \&quot;\[AUTO_LANGUAGE\]\&quot;. }_
    _{_
    _SELECT DISTINCT ?item WHERE {_
    _?item p:P31 ?statement0._
    _?statement0 (ps:P31/(wdt:P279\*)) wd:Q7889._
    _}_

    _LIMIT 100_

    _}_

    _MINUS {_

    _?item wdt:P840 ?narrativeLocation._

    _}_

    _}_

    _\&quot;\&quot;\&quot;_

    _Here\'s a list of items and properties on WikiData:_

    _\&quot;\&quot;\&quot;_

    _subject (Q164573)_

    _video game (Q7889)_

    _publication date (Q1361758)_

    _publication date (P577)_

    _December 2005 (Q464261)_

    _narrative location (Q105115142)_

    _narrative location (P840)_

    _Solomon Islands (Q685)_
    _Solomon Islands (Q148966)_
    _North Africa (Q27381)_
    _\&quot;\&quot;\&quot;_

    _Write me a SPARQL query that gives a video game published on February
    2010 that takes place in the Solomon Islands and North Africa_

    **GPT-3.5:**

    _Here\'s the SPARQL query that should give you the required results:_

    _\`\`\`_

    _SELECT DISTINCT ?game ?gameLabel WHERE {_

    _SERVICE wikibase:label { bd:serviceParam wikibase:language
    \&quot;\[AUTO_LANGUAGE\]\&quot;. }_

    _?game wdt:P31 wd:Q7889;_

    _wdt:P577 ?date._

    _?game wdt:P840 wd:Q685, wd:Q27381._

    _FILTER(YEAR(?date) = 2010 &amp;amp;&amp;amp; MONTH(?date) = 2)_

    _}_

    _\`\`\`_

    _This query retrieves all video games (identified by \`wdt:P31
    wd:Q7889\`) that have been published in February 2010 (identified by the
    \`wdt:P577\` property) and have a narrative location in both the Solomon
    Islands (\`wd:Q685\`) and North Africa (\`wd:Q27381\`), as specified in
    the \`wdt:P840\` property._

    _The \`FILTER\` function ensures that only games published in February
    2010 are returned by extracting the year and month from the \`wdt:P577\`
    property and filtering on them. Note that this assumes that the
    publication date of the games is specified using the \`wdt:P577\`
    property. If a different property is used to specify the publication
    date, that property should be substituted in the query._
    &lt;/blockquote&gt;

&lt;/details&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Plopping that into WikiData, we get the correct answer!
(&lt;a href=&quot;https://w.wiki/6cfo&quot;&gt;query&lt;/a&gt;)
(&lt;a href=&quot;https://w.wiki/6cfq&quot;&gt;result&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image20.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;We can then export detailed JSON of that question back to ChatGPT to get
a nicely formatted answer:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/image13.png&quot; alt=&quot;Answer&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;so-long-fully-automated-space-gay-bowser&quot;&gt;So Long Fully Automated Space Gay Bowser&lt;/h3&gt;

&lt;p&gt;Here’s a pre-existing &lt;a href=&quot;https://python.langchain.com/en/latest/use_cases/agents/wikibase_agent.html&quot;&gt;notebook&lt;/a&gt;
for an automated simple LangChain wikibase agent for doing WikiData lookups.&lt;/p&gt;

&lt;p&gt;Let’s ask it a question past the ChatGPT 2021 knowledge cutoff about the Mario movie.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://media.githubusercontent.com/media/rusch95/rusch95.github.io/master/assets/2023-04-30-wikidata-llms/mario.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And success! It answer with the correct names of the directors.&lt;/p&gt;

&lt;h2 id=&quot;potential-follow-ups&quot;&gt;Potential Follow Ups&lt;/h2&gt;

&lt;h3 id=&quot;impedance&quot;&gt;Impedance&lt;/h3&gt;

&lt;p&gt;I hypothesize that given this LLMs might work more fluently with semantic graphs
than relational data or perhaps even raw text. This might imply a route to thin down
these models some to improve their efficiency.&lt;/p&gt;

&lt;p&gt;For my next post, I’m planning on comparing different approaches on
existing benchmarks like HotPotQA. Additionally, I’m looking into cooking up a new benchmark
on Magic: The Gathering, as it contains interesting well-structured and unstructured data
to pop questions against.&lt;/p&gt;

&lt;h3 id=&quot;fine-tuning--reinforcement-learning&quot;&gt;Fine-Tuning / Reinforcement Learning&lt;/h3&gt;

&lt;p&gt;There’s a lot of hacky steps due to the models being used not being
trained on this domain at all. With something like LLaMa, you could
likely use a mix of fine-tuning and reinforcement learning to make the
model internalize the WikiData labels, be a bit more robust against
SPARQL issues, and cut out a few queries.&lt;/p&gt;

&lt;h3 id=&quot;embeddings--semantic-extractions--etc&quot;&gt;Embeddings / Semantic Extractions / Etc&lt;/h3&gt;

&lt;p&gt;As mentioned earlier, there’s other approaches trying to solve similar
problems like prompt hacking, embedding search
(&lt;a href=&quot;https://txt.cohere.com/embedding-archives-wikipedia/&quot;&gt;Cohere&lt;/a&gt;),
searching context with
agents(&lt;a href=&quot;https://arxiv.org/abs/2210.03629&quot;&gt;ReAct&lt;/a&gt;), and
model editing (&lt;a href=&quot;https://memit.baulab.info/&quot;&gt;MEMIT&lt;/a&gt;). One
potential thing would be extracting relations and seeing if LLMs point
out reasonable relations to add to WikiData. Ontologies are hard to get
right, and maybe LLaMa’s figured out a pretty good one!&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">ChatGPT knows an enormous number of facts about baseball stats, movie quotes, and so on.</summary></entry><entry><title type="html">Flipping Bits for Fun and Profit</title><link href="http://friend.computer/jekyll/update/2020/07/26/flippinig-bits.html" rel="alternate" type="text/html" title="Flipping Bits for Fun and Profit" /><published>2020-07-26T16:35:25+00:00</published><updated>2020-07-26T16:35:25+00:00</updated><id>http://friend.computer/jekyll/update/2020/07/26/flippinig-bits</id><content type="html" xml:base="http://friend.computer/jekyll/update/2020/07/26/flippinig-bits.html">&lt;p&gt;Today’s story is about how I hackily patched a &lt;em&gt;division-by-zero&lt;/em&gt; bug in the game &lt;em&gt;Brogue&lt;/em&gt; by using a few useful tools available on MacOS.&lt;/p&gt;

&lt;p&gt;I’m a sucker for well-made roguelikes. I cut my teeth on the venerable &lt;em&gt;NetHack&lt;/em&gt;, had a long affair with the very 
focused &lt;em&gt;Dungeon Crawl: Stone Soup&lt;/em&gt;&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;, and have flitted around with fresh takes on the genre like &lt;em&gt;Cataclysm: Dark Days Ahead&lt;/em&gt;&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;My latest roguelike obsession has been &lt;em&gt;Brogue&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://user-images.githubusercontent.com/10816808/88485793-4d4c7600-cf3e-11ea-8c56-3599c806763a.png&quot; alt=&quot;Screenshot of Brogue&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Brogue, whose development began in 2009, is fairly well renowned and influential for its 
&lt;a href=&quot;https://www.tigsource.com/2012/01/15/brogue/&quot;&gt;very streamlined game design&lt;/a&gt; and 
&lt;a href=&quot;https://www.rockpapershotgun.com/2015/01/23/have-you-played-brogue/&quot;&gt;accessible UI&lt;/a&gt;. 
You may have even been exposed to Brogue’s core mechanical conceits through &lt;em&gt;Pixel Dungeon&lt;/em&gt;,the oft-forked and remixed mobile app, such as I was.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;While procrastinating working on performance reviews, I had been playing some Brogue. 
Picking up a magic Rapier of Confusion in the game, a quick weapon with powerful lunges, I enchanted it heavily with magic scrolls.&lt;/p&gt;

&lt;p&gt;Stepping onto a new level, I enhanced it one more time with a Scroll of Enchantment. 
I then went to stab a lonesome goblin, when my game suddenly crashed. I shrugged, as this is not too uncommon. 
Reloading the game again from my auto-save and appearing at the start of the level, I enchanted the rapier once again and tried to stab the goblin. 
Again, the game crashed.&lt;/p&gt;

&lt;p&gt;Usually the game crashes for very inexplicable reasons, but this was a repeatable issue. 
I reloaded the game, enchanted the weapon, and then tried to look at the weapon’s stats. 
The game crashed.&lt;/p&gt;

&lt;p&gt;This time instead of hand waving away the MacOS crashed program prompt, I clicked on the report issue button. 
Thanks to Brogue being open-source, the program was compiled with debug symbols, 
so I could see by the stack trace that crash was being caused by a 
division-by-zero error by the fp_sqrt function called by the function runicWeaponChance. 
By searching for &lt;em&gt;runicWeaponChance&lt;/em&gt; in the repo, 
I quickly found the line at fault &lt;a href=&quot;https://github.com/tsadok/brogue/blob/master/src/brogue/PowerTables.c#L418&quot;&gt;here&lt;/a&gt;,
where the game adjust the proc change of magical effects for quick weapons, such as the rapier.&lt;/p&gt;

&lt;p&gt;At first, I tried to do things the proper way. 
I downloaded the code from the official site and tried to compile it with Xcode. 
As many know, simply compiling code can often turn into a fun exercise in yak shaving. 
Apple appears to like to deprecate things very quickly with my version of Xcode 
refusing to touch the old Swift code that Brogue uses for its Mac UI. My heart pines 
for Windows more humane treatment of elderly code, as 
opposed to MacOS’s &lt;a href=&quot;https://en.wikipedia.org/wiki/Logan%27s_Run_(film)&quot;&gt;Logan’s Run-esque&lt;/a&gt; regime.&lt;/p&gt;

&lt;p&gt;I could have kept banging my head against Xcode, but it seemed like so much effort to change one 
line of code. At that point, I made my decision: even though there was probably a much better way 
of fixing this bug, I was just going to patch it out by hand.&lt;/p&gt;

&lt;p&gt;This was a nice exercise to explore the state of reverse engineering on MacOS, albeit with a non-obfuscated, source available, symbol lush binary.&lt;/p&gt;

&lt;p&gt;I first took a stab at the binary with &lt;a href=&quot;https://www.hopperapp.com/&quot;&gt;Hopper Diassembler&lt;/a&gt;. 
The free-trial is fairly restrictive, but the labeling provided was very useful in 
hunting down the assembly code section corresponding to the buggy line of code.&lt;/p&gt;

&lt;p&gt;I then used &lt;em&gt;otool&lt;/em&gt; to get a text disassembly dump via the command &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;otool -vtj Brogue &amp;gt; asmdump&lt;/code&gt;.
&lt;em&gt;otool&lt;/em&gt; is a nifty CLI program for working with LLVM produced code and artifacts that I believe is on MacOS by default.
I’m a little more dextrous interacting with code via Vim, so it was easier to play around with the 
specific assembly sections. Additionally, this tool gave me the full hexdump of each line of assembly, 
which I couldn’t easily tease out of Hopper.&lt;/p&gt;

&lt;p&gt;Via a combination of Hopper and otool, I anchored on that call to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fp_sqrt&lt;/code&gt;, represented by &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;callq 0x1000209f0&lt;/code&gt; 
in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;otool&lt;/code&gt; dump. Scanning up, I found the telltale &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cmpq&lt;/code&gt; then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;je&lt;/code&gt; that usually represents a branch in x86-64 assembly.&lt;/p&gt;

&lt;p&gt;I struggled a bit finding ways of reversing the otool dump back into code, especially since I wanted to play 
with Brogue, instead of playing around with getting the correct linked header files. Thus, I fell back on trusty xxd.&lt;/p&gt;

&lt;p&gt;I dumped the binary to a structured hexdump with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xxd Brogue &amp;gt; hexdump&lt;/code&gt;.
&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xxd&lt;/code&gt; is a very simple, fairly standard Unix program for creating readable hexdumps of binaries. 
I found particularly unique hexcode in the neighborhood of that branch in the otool dump &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e8 7811&lt;/code&gt; and jumped to it with the fun regex &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;e8.\?78.\?11&lt;/code&gt; 
since I didn’t know how it would be split in the hexdump. I then found the branch leading to the code in the hexdump.&lt;/p&gt;

&lt;p&gt;Looking back at the code and noticing that it was just a slight clamp on how powerful this Rapier of Confusion would be. 
I decided to short circuit the code. I changed the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JE&lt;/code&gt; (jump if the previous cmp check was equal) to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;JNO&lt;/code&gt; (jump if not overflow). 
This jump was now was now always taken, skipping over the buggy proc chance code. 
To perform this change, I changed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0F 84&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0F 81&lt;/code&gt;. I then reversed the hexdump with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xxd -r hexdump &amp;gt; Brogue&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I loaded back up the game. Sadly, the game handles saves via replay, which now diverged given that the proc chance was higher. 
Someone actually smart probably could have patched it to actually fix the bug and avoid the desync issue. 
Instead, I just grabbed the seed for that run and quickly played back to the autosave.&lt;/p&gt;

&lt;p&gt;And there was no more crashing. I got back to the goblin and killed it. 
Soon after that, I had an unfortunate run-in with some eels and was likewise killed.&lt;/p&gt;

&lt;p&gt;I went back to working on my performance review, having learned a few more tricks for 
hacking binaries on MacOS and a greater fear and respect for eels.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Or DCSS as the cool kids say &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Now you may say, “that’s a rouge-_lite_, good sir.” To which I respond: “Ok.” See  &lt;a href=&quot;http://www.roguebasin.com/index.php?title=Berlin_Interpretation&quot;&gt;http://www.roguebasin.com/index.php?title=Berlin_Interpretation&lt;/a&gt; for an ideological struggle similar Vim vs. Emacs or schism of the Early Christian Church. Roguelikes are &lt;em&gt;serious&lt;/em&gt; business. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;All of these mentioned games happen to be free-range cage-free open source code &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Today’s story is about how I hackily patched a division-by-zero bug in the game Brogue by using a few useful tools available on MacOS.</summary></entry><entry><title type="html">My first post!!</title><link href="http://friend.computer/jekyll/update/2019/04/23/my-first-post.html" rel="alternate" type="text/html" title="My first post!!" /><published>2019-04-23T03:53:25+00:00</published><updated>2019-04-23T03:53:25+00:00</updated><id>http://friend.computer/jekyll/update/2019/04/23/my-first-post</id><content type="html" xml:base="http://friend.computer/jekyll/update/2019/04/23/my-first-post.html">&lt;p&gt;I thought this might be a nice place to dump my various pictures of food and screenshots of code.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/img/comp.gif&quot; alt=&quot;friend computer themself&quot; /&gt;&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">I thought this might be a nice place to dump my various pictures of food and screenshots of code.</summary></entry></feed>