[The following article is cross-posted from DataBreaches.net. All follow-ups will be posted on that site and all inquiries should be directed to breaches[at]databreaches.net]
First someone left our voter registration details exposed to the world, but those were “just public records,” some argued. Now a second misconfigured database has been uncovered by Chris Vickery. This one, however, not only includes some states’ voter lists, but it also includes 19 million profiles with private information on religion, household values, gun ownership and more. Are you outraged now?
So Much for a Quiet Christmas
Five days after finding a misconfigured database with 191 million voters’ records, and only hours after he woke up the owner of Three Lock Box in the middle of the night to ask him why their database still wasn’t secured, Chris Vickery emailed DataBreaches.net and Salted Hash. He had found yet another misconfigured database with voters’ information. It was 7:30 am on Christmas morning.
Idly wondering whether Vickery ever slept or took a break, I grabbed another cup of coffee and started looking at what he had found.
At first glance, the newly discovered database had the same data fields and formatting as the first database – fields and formatting which a number of experts had previously told us indicated that the data had flowed through Nation Builder at some point. And like the first database, this was a misconfigured MongoDB installation.
But there were obvious differences between the two databases, apart from the fact that were on different IP addresses. The state voter lists in this second database had been updated more recently (April, 2015) and it didn’t have voter lists for all states. It appeared that the database was being populated alphabetically by state. There were approximately 37 million records in the first part of the database from voters in states that began A-I, except for D.C., Illinois, and Iowa, none of which had any state voter records.
What really distinguished this database, though, was the inclusion of additional files with approximately 19 million records containing much more personal and demographic information on voters or potential voters.
All told, there were 56,722,986 records in this database, but that did not necessarily translate into 56.7 million unique voters, as there were previous versions of some lists included.
To give you a sense of the kinds of personal information being sought and/or obtained, here’s a redacted screen shot showing the data fields included in one of the “listdata” files in this other part of the database, taken from the records of one voter:
In addition to name, address, telephone number, political party, ethnicity and indications as to whether the individual had voted in primary and general elections, records contained fields indicating whether the person was a religion donor, a charity donor, a gun owner, whether they were politically conservative, whether they were interested in hunting or fishing or auto racing, where they worked, and their income level. Other fields scored individuals on “bible_lifestyle” and “household_values.” For some people, e-mail addresses and cell phone number appear to have been included.
Note: not all voters in the state voter list records part of the database had entries in the profiling files, and not all people who had records in these profile files had their state voter’s registration in the first part of the database. Nor were all fields in the profile records populated with scores.
Even though they contained fields on “donor” information that might be useful for fund-raising purposes, the lists looked like they would be used for a voter registration campaign or political purposes.
According to Vickery, records in the profile files were not sorted by state, but he spotted numerous records from Iowa voters (even though there was no state voter list for Iowa). California voters were also represented in the profile files. At the time of this publication, Vickery is still reviewing what is in the profile files.
While I was angry to see private information like this exposed to potentially anyone and everyone, Vickery was excited. He told DataBreaches.net, “My first reaction, after finding the second database, was, ‘Excellent… this should help lead to the perpetrator.’ My gut said there would be something here that would finally point to the entity at fault. If I had found it first, before the bigger one, then I would have been a bit offended. But because I was ‘on the hunt’ for more information, I was more glad to find it than I was offended.”
Although it was Christmas, Chris called the Texas office of the FBI to inform them about the leak. They would subsequently tell him that this was out of their scope of operations. Over the next few days, Steve Ragan made contact with a federal agency while I contacted other federal agents and contacted California officials about this newest leak.
We now had two unsecured databases with voter information, and our first priority was to get them secured.
Following the Clues
Unlike the first database that contained only one or two clues to guide us, there were some actual leads in the second database that we could follow:
- two data fields were labeled “Pioneer_status” and “Pioneer_counter”;
- there was a reference to “Pioneer” in the schema;
- data fields relating to bible and family values suggested a right-wing Christian-oriented organization;
- urls referenced in one file included Heroic Media, Let’s Vote America, and importantly, pioneersolutionsinc.com, and
- Let’s Vote America is a United in Purpose campaign, and UiP’s Services page specifically ties it to Pioneer Solutions.
According to the left-wing Political Research Associates (PRA), United in Purpose, a 501(c)(4)organization, was founded after the 2012 presidential election. A 501(c)(4) is a “social welfare” organization. They are allowed to engage heavily in political action, but are limited to spending no more than 50% of their money on politics.
PRA reports that by September, 2014, United in Purpose had
created or updated several online tools, including motivational and instructional videos aimed at pastors. The California-based UiP is led by Bill Dallas, an ex-con and Tea Party activist, whose organization has made news in the last few election cycles by engaging in deep data mining, database building, and online tools for the ongoing short and long-term development of the Christian Right. Their strategic voter registration app, promoted by the UiP project Champion the Vote, allows pastors to compare their church membership lists with official voter registration files, so they can see who among their congregants they need to recruit into electoral life.
Two years earlier, NPR had reported that UiP had compiled information on 180 million Americans. If they had information on 180 million in 2012, they might have information on 191 million by February, I speculated.
NPR’s report described their methods at the time:
The company buys lists to build a profile of each citizen, and then assigns points for certain characteristics. You get points if you’re on an anti-abortion list or a traditional marriage list. You get a point if you regularly attend church or home-school your kids. You get points if you like NASCAR or fishing.
“If [your score] lalabeled over 600 points, then we realized you were very serious about your faith,” Dallas says. “Then we run that person against the voter registration database. … If they were not registered, that became one of the key people we were going to target to go after.”
Those reports certainly appear consistent with the data fields – and data – we saw in this database and the reference to “survey” as a “source” for some profile records.
But was this UiP’s database, Pioneer Solutions’ database, or one of UiP’s many partners? And who was actually responsible for failing to secure this database? Bill Dallas is the head of both UiP and Pioneer Solutions. Are the two so intertwined that we wouldn’t be able to figure this out? Indeed, when Vickery explored Pioneer Solutions’ site and signed up for an account, the “welcome to Pioneer Solutions” acknowledgement he received was from [email protected].
The last – and at first, exciting – clue, was that the two users for this second database were “Pioneer” and “Pioneer2.” The first database also had two users: “Pioneer” and “MPioneer,” a detail we had withheld in our earlier reporting because that database was still unsecured at the time we reported the leak. In fact, having seen that “Pioneer” reference in the first database, we had considered the possibility that the larger (191M) database might be associated with Congressman Tiberi’s Pioneer PAC. The Congressman’s staff told me that they didn’t think it would be the PAC’s database because the Congressman only campaigned in Ohio-12, but DataBreaches.net had called and sent emails to the PAC anyway, asking them to investigate and confirm or deny the possibility. The PAC never responded. Now we had to consider whether the two databases might really be linked, or if it was just a coincidence that they both had users named “Pioneer.”
On December 27, DataBreaches.net sent an inquiry to Pioneer Solutions via their contact form, describing the second database, the content_keywords file with those urls, and some of the data field labels.
The very next morning, Tamas Cser, the CEO and Founder of Digital Smart Technologies e-mailed DataBreaches.net:
This is definitely data that we provide however we do not have the content keywords table that you mention. We work with many organization so we should talk to see where the vulnerability is, and plug it asap.
The phone number he gave me to call him at was Digital Smart Technologies’ phone number.
DataBreaches.net called Cser immediately (December 28), and gave him the IP address of the second database. I also gave him the IP address of the first database, and asked him to check that one, too. Cser promised to call back the next day after investigating the matter.
Less than 12 hours after my phone call to Cser, both databases were secured. Since we weren’t checking both of them hourly, I can’t say with any certainty exactly when each one was secured, just that they were both secured by early evening.
But Cser didn’t call back, so we had no official confirmation that either of the databases was from UIP, Pioneer Solutions, or one of their partners or vendors. While only Cser and a few officials had been given the IP address of the second database, a number of people had been given the IP address of the first database by then. Could two different entities each secure their database that day? Sure, but personally, I’ve always been a fan of Occam’s Razor.
After more calls and emails from DataBreaches.net the next day, Cser called again to say that an unspecified “they” were all investigating the second database leak “intensively.” He asked me to delay publishing and give them another 24 hours to get back to me.
During that phone call, Cser denied that the first database was theirs.
Another 24 hours went by, and there was still no call back. I’ve sent several reminders asking him to call, but there has been no response at all as of the time of this publication.
So the Buck Stops… Where?
We can’t tell you definitely whether UiP, Pioneer Solutions, or one of their partners, tech vendors, or “initiatives” is responsible for this second unsecured database. It appears that there are a lot of organizations sharing resources, with UiP at the top, but no one has admitted responsibility for this second database. Then again, they haven’t denied it, either. I would assume that if the California Attorney General’s Office or Secretary of State Elections Commission asks them, they will get the straight answer that has so far eluded us. Both agencies have been made aware of this second leak involving voters’ information.
And frustratingly, we still don’t know for sure who’s responsible for the first misconfigured database that was secured within hours of this second database.
Potential for Abuse?
As Kalev Leetaru reported on Forbes this past weekend, voter profiles are like gold to campaigns, and there’s nothing illegal about gathering the information from willing participants. But being willing to participate in a survey or study doesn’t translate into willingness to have your personally identifiable information in files that are exposed on the Internet.
Then, too, there’s the possibility of misuse if the voter profiles fell into the wrong individuals’ hands. Even though some of the information in the profiles might be public records in most states, the inclusion of religion factors, interests, income level, employer name, and other factors could create a record that can easily be misused for targeted phishing. If you belong to a particular church, and get an email, “Fire at <insert your church’s name>: Help us rebuild!” would you open an attachment supposedly showing the fire damage? Or if your record shows you’re a Pastor, would you be more likely to click on a link in an email that announced a new resource for pastors to assist their flocks or that gave you a link to read a story about a scandal involving a pastor?
Who Regulates This?
On a state level, California seems the most likely state to investigate these leaks of voter information. Not only does it have more restrictive laws on use of information from voter registration, but if I understand their law, they also prohibit sharing of information from voter registration cards by a recipient of the voters’ list without the express prior approval of the Secretary of State. I do not know whether the protected information was actually shared, or if it was shared, if it was shared with approval from the Secretary of State. But it’s a question that the state may wish to investigate.
On a federal level, and under Section 5 of the FTC Act, the Federal Trade Commission can enforce data security by taking action against entities that engage in “unfair practices” which cause, “or are likely to cause” substantial injury to consumers. But the FTC does not have enforcement authority over non-profit organizations such as PACS and voter campaigns. And as we reported previously, state voter lists are often freely available public records, although there are some restrictions by some states.
If, however, it turns out an IT vendor was responsible, the IT vendor, if it’s a for-profit business, might fall under the FTC’s authority. The FTC has taken enforcement action against one vendor already (GMR Transcription Services), but might they do so here?
It may depend, in part, on whether the FTC would see this security failure as likely to cause substantial injury. Would increased risk of targeted phishing qualify for that? I don’t know for sure how the FTC would answer that question, but they might view it as such.
Head on over to Steve Ragan’s post on Salted Hash to get his report on this newest misconfigured database and his insights on it. And knowing Chris’s enthusiasm, he will probably be commenting on it over on reddit.
Update 1: As anticipated, Chris posted something on reddit, here.