Namemaker
The other day I reconstructed an old program I wrote once, many years ago.
It had gotten lost somewhere in the smoke and clatter of various hard drive
crashes and changes of computers and operating systems -- but I still somehow
retained its essence in that steel-trap brain of mine (hah! a funny smiley
face set of text characters should certainly follow that statement).
This program generates "fake text", based on the structure of "seed" text fed
to it. The output seems "almost" reasonable. Given enough initial text data of
a particular type, it outputs words that sort of sound like the seed text --
but not exactly. That is to say, the structure of the outputted "fake" words
are not necessarily proper English -- but they do reflect somewhat the makeup
of the seeded English words. It does best on simple lists of words. One good
example is proper names of people or articles. If I were a Marketing or
Advertising guy, I would use it to spit out names of new drugs, automobiles,
genetically created animals, novel products, etc. It would also be useful for
an author searching for a unique name for a character or an imaginary entity.
The seed text is first analyzed for its letter sequence structure. Arrays are
set up that tell, for each given character in the text, the frequency of
occurrence of the 3 characters before it. For example, if the seed text
contains the word "human", and the letter being analyzed is the "a", the
letter "m" will be counted as occurring immediately prior to an "a" once, the
letter "u" will be counted at occurring as the 2nd letter preceding an "a"
once, and the "h" as a 3rd letter preceding an "a" once. The counts of
preceding letters in the text are incremented according to how often they are
seen preceding any particular letter.
To be clearer, a 3-dimensional array is set up covering each letter in the
alphabet. The declaration is given as: LTR(128,3,128), where the first
dimension's element is the ASCII-numbered bin for the primary letter in
question, the second relates to whether the array element is covering its
1st-preceding, 2nd-preceding or 3rd-preceding letter history, and the last is
the ASCII-numbered bin of that preceding letter. (You may ask why I included
128 bins when there are only 26 letters plus a space needed in the output. I
can only answer that the ASCII values for lower case text runs in the range
from 97 to 122, and the ASCII value for a space character is 32. So most
of the array is actually unused. But since RAM is not as precious
nowadays as it used to be, doing it this way simplifies identifying the
applicable array elements when doing the pre-analysis counting and tabulating
of the seed text. In this case, I merely assign the array element "count
container" number to equal the ASCII number of the character it represents.
It would be much more complicated to do it in a space-efficient manner.
In the old days, I would have been forced to do so.)
The fake words are constructed one character at a time. Each tentative
character is tested to see if it has ever followed the last 3 new characters,
back in the seed text. The test is skewed a bit to make the final chosen
character one that has been more "commonly" (rather than more "rarely")
followed by the 3 preceding characters. If the tentative selection is a
complete bust, the program flow jumps out of the subroutine and tries for a
better tentative letter. There are routines in place to try to limit the total
length of the word, but in a manner that is "natural".
For example, say we want to generate new female proper names. We can generate
a "seed" string listing many common female names, plug it into the program,
and look through the output for interesting new names. Here is an example of
some output fashioned in this manner:
The program (VB) code is given below. Establish a form called
"Namemaker" and put a good-sized text box in it called "text1". (Change its
property to "yes" for "multiline".) Create the various procedure subroutines
as noted below and see if you can copy this text into them. You can try
changing the seed string to other lists of articles, say, the elements, car
names, animal names, etc. Amaze your friends with your newfound naming
creativity!
'variable declarations
Dim ltr(128, 3, 128)
Dim a$, b$, c$
Dim r, s, t, u, v
Dim i, j, k
Dim mt, ct, cx
Dim sumbef1
Public Sub Form_Load()
'here is the VB form page,
governing the program flow
Randomize Timer
Namemaker.WindowState = 2
Namemaker.Show
Call targetstring
Call fillarray
Call mainloop
End Sub
Public Sub targetstring()
'c$ is the feed datastring -- put
double spaces between each name!
c$ = " alice annie arlene annette anna barbara bambi
beatrice bonnie carol "
c$ = c$ & "constance cheryl cherise cynthia candy celia clarice diedre
darlene "
c$ = c$ & "earlene erma esther felicia frances florence faye gloria georgia
grace "
c$ = c$ & "gladys helen heide henrietta iona julia janette jasmine karen
kerry "
c$ = c$ & "linda lynn leona mandy marlene marty marilyn norma nanette opal "
c$ = c$ & "peony pearl phoebe queenie ruth susan suzette tanny ursula violet
"
c$ = c$ & "wanda yolanda zane zinnia charlene sherry cherry lillian libby
mary "
c$ = c$ & "yvonne yvette patricia patsy jade ruby sapphire rose daisy lucy "
c$ = c$ & "chris donna dorothy lisa elizabeth maria obediah olivia loretta "
c$ = c$ & "chloe uma evah winifred thelma inez victoria vicky rheba jewel "
c$ = c$ & "diana dianne dionne heather anita "
End Sub
Public Sub fillarray()
'analyze the feed string for the
frequency of characters preceding
'each one, as they are
interrogated in sequence one-by-one through the string.
'establish the relevant arrays
for this.
'for reference:
'ASCII 65-90=caps A-Z
'ASCII 32=space
'ASCII 97-122=small case a-z
For i = 4 To Len(c$)
'target letter in feed datastring
a$ = Mid$(c$, i, 1)
r = Asc(a$)
If r <> 32 Then
'if a capital make a lower case
If r >= 65 And r <= 65 + 25 Then r = r + 32
'if not a lower case letter or space
If r < 97 Or r > 97 + 25 Then r = 0
End If
'the character before
a$ = Mid$(c$, i - 1, 1)
s = Asc(a$)
If s <> 32 Then
'if a capital make a lower case
If s >= 65 And s <= 65 + 25 Then s = s + 32
'if not a lower case letter or space
If s < 97 Or s > 97 + 25 Then s = 0
End If
If s > 0 Then ltr(r, 1, s) = ltr(r, 1, s) + 1
'the character before that
a$ = Mid$(c$, i - 2, 1)
s = Asc(a$)
If s <> 32 Then
'if a capital make a lower case
If s >= 65 And s <= 65 + 25 Then s = s + 32
'if not a lower case letter or space
If s < 97 Or s > 97 + 25 Then s = 0
End If
If s > 0 Then ltr(r, 2, s) = ltr(r, 2, s) + 1
'and the character before that
one
a$ = Mid$(c$, i - 3, 1)
s = Asc(a$)
If s <> 32 Then
'if a capital make a lower case
If s >= 65 And s <= 65 + 25 Then s = s + 32
'if not a lower case letter or space
If s < 97 Or s > 97 + 25 Then s = 0
End If
If s > 0 Then ltr(r, 3, s) = ltr(r, 3, s) + 1
Next
End Sub
Public Sub mainloop()
'construct and display the output
string
'default temp ASCII number
variables to blanks
r = 32: s = 32: u = 32
'go thru while/wend at least once
'and make sure you get non-blanks to start with
While r = 32 Or s = 32 Or u = 32
'get a starting point in the target string
v = 3 + Int(Rnd * (Len(c$) - 2))
'set a 3-character substring
a$ = Mid$(c$, v, 3)
'start off the output string b$ with it
b$ = a$
'collect the ASCII numbers of the 3 characters
r = Asc(Mid$(a$, 3, 1))
s = Asc(Mid$(a$, 2, 1))
u = Asc(Mid$(a$, 1, 1))
Wend
'set the total length of the
output string
While Len(b$) < 500
'intialize "successful run"
counter
cx = 0
'look for a letter with some good
history behind it;
'that is, one that has been seen following the 3 before it
While mt < 3
mt = 0
'pick a letter to try
t = Int(Rnd * 129)
'don't want a blank following too near
a blank
While t = 32 And (r = 32 Or s = 32)
t = Int(Rnd * 129)
Wend
'pick a good name starting letter
if the last letter was a blank
If r = 32 Then
While ltr(t, 1, 32) = 0 And ltr(t, 2, 32) = 0
t = Int(Rnd * 129)
Wend
End If
'lengthen if too short
If t = 32 And Len(b$) > 6 Then
For j = Len(b$) - 5 To Len(b$)
If Asc(Mid$(b$, j, 1)) = 32 Then t = Int(Rnd * 129)
Next
End If
'if the last letter has been seen to
precede this one
If ltr(t, 1, r) > 0 Then
'sum up the history array contents for
the 1st-preceding letter
For j = 1 To 128: sumbef1 = sumbef1 + ltr(t, 1, j):
Next
If sumbef1 = 0 Then sumbef1 = 1
'tentative pick, if it passes a
frequency test
If ltr(t, 1, r) / sumbef1 > (Rnd * 5) / sumbef1 Then
mt = mt + 1
End If
'if the letter before that has been
seen to precede this one
If ltr(t, 2, s) > 0 Then
'sum up the history array contents for
the 2nd-receding letter
For j = 1 To 128: sumbef1 = sumbef1 + ltr(t, 2, j): Next
If sumbef1 = 0 Then sumbef1 = 1
'tentative pick, if it passes a
frequency test
If ltr(t, 2, s) / sumbef1 > (Rnd * 5) / sumbef1 Then
mt = mt + 1
End If
'and if the letter before that one has
been seen to precede this one
If ltr(t, 3, u) > 0 Then
'sum up the history array contents for
the 3rd-preceding letter
For j = 1 To 128: sumbef1 = sumbef1 + ltr(t, 3, j):
Next
If sumbef1 = 0 Then sumbef1 = 1
'tentative pick, if it passes a
frequency test
If ltr(t, 3, u) / sumbef1 > (Rnd * 5) / sumbef1 Then
mt = mt + 1
End If
'give a good chance of shortening the
name if it is too long, in a natural way
If ct > 7 And ltr(32, 1, t) > 0
And ltr(32, 2, t) > 0 Then
If Rnd > 0.9 Then
t = 32
mt = 3
End If
End If
'increment wend counter -- if >5000,
then call this round "unsuccessful"
'and jump out of while/wend
'("unsuccesful" probably means the history for the selected letter did not
include
'the previous letters that were already in the output string)
cx = cx + 1
If cx > 5000 Then mt = 3
'mt=3 if the randomly chosen letter
passed all the tests! If so, get out
Wend
'reset match flag for next time thru
mt = 0
'increment name length counter
ct = ct + 1
'reset counter if a blank was returned
(end of name)
If t = 32 Then ct = 0
'if the letter pick was truly
successful
If cx < 5001 Then
'reset 3rd, 2nd, 1st-preceding letters
in string
u = s
s = r
r = t
'add the successful character to
output string
b$ = b$ & Chr$(t)
End If
Wend
'print final output string
Namemaker.text1.Text = b$
End Sub
Postscript -- You might play around with changing the multiplier for "RND"
in the test lines of mainloop. Higher numbers make the testing harder (and
that causes the output string to take longer to appear), but it will produce
more "realistic" names, depending on the size of the seed string. You have to
try to balance this multiplier according to what the total "array counts" of
preceding letters are obtained as the pre-analysis routine goes through the
seed string. I am running a 2 gigaHz machine, so I can afford to be a bit
choosy before my patience runs out. If the output takes too long to produce,
reduce the multiplier a little bit. You know you have the number about right
when you see a name produced verbatim from the seed listing once in awhile.
(I think I had a better way of tapping into and utilizing a "frequency
histogram" for the array counts, but I don't recall exactly how I
originally did that, when I rewrote this program.) Also, you can reduce the size of the output string (given as 500 above) to cut
down the time it takes to output. If the name lengths don't seem quite to your
liking, try changing the "lengthening" and "shortening" routine constants.
You also may want to put a "done" button on the main form. I didn't bother to
show that control code (which is simply "end"). On my program, I also have a
"repeat" button to redo the output string. You will need to reinitialize b$=""
under the repeat control button's code, and set Namemaker.text1.text="". After
that, just add "call mainloop" again at the end of the "repeat" control button's code.
Back to
Recreational Computing...
|